Yeah, CatalystScan should give you everything we can possibly push down in raw form. Note that this is not compatible across different spark versions.
On Thu, Nov 19, 2015 at 8:55 AM, james.gre...@baesystems.com < james.gre...@baesystems.com> wrote: > Thanks Hao > > > > I have written a new Data Source based on ParquetRelation and I have just > retested what I had said about not getting anything extra when I change it > over to a CatalystScan instead of PrunedFilteredScan and ooops it seems to > work fine. > > > > > > > > > > > > *From:* Cheng, Hao [mailto:hao.ch...@intel.com] > *Sent:* 19 November 2015 15:30 > *To:* Green, James (UK Guildford); dev@spark.apache.org > *Subject:* RE: new datasource > > > > I think you probably need to write some code as you need to support the > ES, there are 2 options per my understanding: > > > > Create a new Data Source from scratch, but you probably need to overwrite > the interface at: > > > https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala#L751 > > > > Or you can reuse most of code in ParquetRelation in the new DataSource, > but also need to modify your own logic, see > > > https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRelation.scala#L285 > > > > Hope it helpful. > > > > Hao > > *From:* james.gre...@baesystems.com [mailto:james.gre...@baesystems.com > <james.gre...@baesystems.com>] > *Sent:* Thursday, November 19, 2015 11:14 PM > *To:* dev@spark.apache.org > *Subject:* new datasource > > > > > > We have written a new Spark DataSource that uses both Parquet and > ElasticSearch. It is based on the existing Parquet DataSource. When I look > at the filters being pushed down to buildScan I don’t get anything > representing any filters based on UDFs – or for any fields generated by an > explode – I had thought if I made it a CatalystScan I would get everything I > needed. > > > > This is fine from the Parquet point of view – but we are using ElasticSearch > to index/filter the data we are searching and I need to be able to capture > the UDF conditions – or have access to the Plan AST in order that I can > construct a query for ElasticSearch. > > > > I am thinking I might just need to patch Spark to do this – but I’d prefer > not too if there is a way of getting round this without hacking the core > code. Any ideas? > > > > Thanks > > > > James > > > > Please consider the environment before printing this email. This message > should be regarded as confidential. If you have received this email in > error please notify the sender and destroy it immediately. Statements of > intent shall only become binding when confirmed in hard copy by an > authorised signatory. The contents of this email may relate to dealings > with other companies under the control of BAE Systems Applied Intelligence > Limited, details of which can be found at > http://www.baesystems.com/Businesses/index.htm. > Please consider the environment before printing this email. This message > should be regarded as confidential. If you have received this email in > error please notify the sender and destroy it immediately. Statements of > intent shall only become binding when confirmed in hard copy by an > authorised signatory. The contents of this email may relate to dealings > with other companies under the control of BAE Systems Applied Intelligence > Limited, details of which can be found at > http://www.baesystems.com/Businesses/index.htm. >