I considered adding this to DataSource APIV2 ticket but I didn't want to be first :P Do you think there will be any issues with opening up the partitioning as well?
On Fri, Jun 16, 2017 at 11:58 AM Reynold Xin <r...@databricks.com> wrote: > Perhaps we should extend the data source API to support that. > > > On Fri, Jun 16, 2017 at 11:37 AM, Russell Spitzer < > russell.spit...@gmail.com> wrote: > >> I've been trying to work with making Catalyst Cassandra partitioning >> aware. There seem to be two major blocks on this. >> >> The first is that DataSourceScanExec is unable to learn what the >> underlying partitioning should be from the BaseRelation it comes from. I'm >> currently able to get around this by using the DataSourceStrategy plan and >> then transforming the resultant DataSourceScanExec. >> >> The second is that the Partitioning trait is sealed. I want to define a >> new partitioning which is Clustered but is not hashed based on certain >> columns. It would look almost identical to the HashPartitioning class >> except the >> expression which returns a valid PartitionID given expressions would be >> different. >> >> Anyone have any ideas on how to get around the second issue? Would it be >> worth while to make changes to allow BaseRelations to advertise a >> particular Partitioner? >> > >