[ 
https://issues.apache.org/jira/browse/SPARK-8655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Nitschinger updated SPARK-8655:
---------------------------------------
    Description: 
I'm working on a custom data source, porting it from 1.3 to 1.4.

On 1.3 I could easily extend the SparkSQL imports and get access to it, which 
meant I could use custom options right away. One of those is I pass a Filter 
down to my Relation for tighter schema inference against a schemaless database.

So I would have something like:

n1ql(filter: Filter = null, userSchema: StructType = null, bucketName: String = 
null)

Since I want to move my API behind the DataFrameReader, the SQLContext is not 
available anymore, only through the RelationProvider, which I've implemented 
and it works nicely.

The only problem I have now is that while I can pass in custom options, they 
are all String typed. So I have no way to pass down my optional Filter anymore 
(since parameters is a Map[String, String]).

Would it be possible to extend the options so that more than just Strings can 
be passed in? Right now I probably need to work around that by documenting how 
people can pass in a string which I turn into a Filter, but that's somewhat 
hacky.

Note that built-in impls like JSON or JDBC have no issues, because since they 
can access the SQLContext (private) without issues, they don't need to go 
through the decoupling of the RelationProvider and can do any custom arguments 
they want on their methods.

  was:
I'm working on a custom data source, porting it from 1.3 to 1.4.

On 1.3 I could easily extend the SparkSQL imports and get access to it, which 
meant I could use custom options right away. One of those is I pass a Filter 
down to my Relation for tighter schema inference against a schemaless database.

So I would have something like:

n1ql(filter: Filter = null, userSchema: StructType = null, bucketName: String = 
null)

Since I want to move my API behind the DataFrameReader, the SQLContext is not 
available anymore, only through the RelationProvider, which I've implemented 
and it works nicely.

The only problem I have now is that while I can pass in custom options, they 
are all String typed. So I have no way to pass down my optional Filter anymore 
(since parameters is a Map[String, String]).

Would it be possible to extend the options so that more than just Strings can 
be passed in? Right now I probably need to work around that by documenting how 
people can pass in a string which I turn into a Filter, but that's somewhat 
hacky.


> DataFrameReader#option supports more than String as value
> ---------------------------------------------------------
>
>                 Key: SPARK-8655
>                 URL: https://issues.apache.org/jira/browse/SPARK-8655
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 1.4.0
>            Reporter: Michael Nitschinger
>
> I'm working on a custom data source, porting it from 1.3 to 1.4.
> On 1.3 I could easily extend the SparkSQL imports and get access to it, which 
> meant I could use custom options right away. One of those is I pass a Filter 
> down to my Relation for tighter schema inference against a schemaless 
> database.
> So I would have something like:
> n1ql(filter: Filter = null, userSchema: StructType = null, bucketName: String 
> = null)
> Since I want to move my API behind the DataFrameReader, the SQLContext is not 
> available anymore, only through the RelationProvider, which I've implemented 
> and it works nicely.
> The only problem I have now is that while I can pass in custom options, they 
> are all String typed. So I have no way to pass down my optional Filter 
> anymore (since parameters is a Map[String, String]).
> Would it be possible to extend the options so that more than just Strings can 
> be passed in? Right now I probably need to work around that by documenting 
> how people can pass in a string which I turn into a Filter, but that's 
> somewhat hacky.
> Note that built-in impls like JSON or JDBC have no issues, because since they 
> can access the SQLContext (private) without issues, they don't need to go 
> through the decoupling of the RelationProvider and can do any custom 
> arguments they want on their methods.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to