Re: Spark DataFrames With Cache Key and Value Objects

Stuart Macdonald Fri, 27 Jul 2018 06:32:11 -0700

Here’s the ticket:  

https://issues.apache.org/jira/browse/IGNITE-9108


Stuart.  


On Friday, 27 July 2018 at 14:19, Nikolay Izhikov wrote:

> Sure.
>  
> Please, send ticket number in this thread.
>  
> пт, 27 июля 2018 г., 16:16 Stuart Macdonald <[email protected] 
> (mailto:[email protected])>:
>  
> > Thanks Nikolay. For both options if the cache object isn’t a simple type,
> > we’d probably do something like this in our Ignite SQL statement:
> >  
> > select cast(_key as binary), cast(_val as binary), ...
> >  
> > Which would give us the BinaryObject’s byte[], then for option 1 we keep
> > the Ignite format and introduce a new Spark Encoder for Ignite binary types
> > (
> >  
> > https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/Encoder.html
> > ),
> > so that the end user interface would be something like:
> >  
> > IgniteSparkSession session = ...
> > Dataset<Row> dataFrame = ...
> > Dataset<MyValClass> valDataSet =
> > dataFrame.select(“_val_).as(session.binaryObjectEncoder(MyValClass.class))
> >  
> > Or for option 2 we have a behind-the-scenes Ignite-to-Kryo UDF so that the
> > user interface would be standard Spark:
> >  
> > Dataset<Row> dataFrame = ...
> > DataSet<MyValClass> dataSet =
> > dataFrame.select(“_val_).as(Encoders.kryo(MyValClass.class))
> >  
> > I’ll create a ticket and maybe put together a test case for further
> > discussion?
> >  
> > Stuart.
> >  
> > On 27 Jul 2018, at 09:50, Nikolay Izhikov <[email protected] 
> > (mailto:[email protected])> wrote:
> >  
> > Hello, Stuart.
> >  
> > I like your idea.
> >  
> > 1. Ignite BinaryObjects, in which case we’d need to supply a Spark Encoder
> > implementation for BinaryObjects
> >  
> > 2. Kryo-serialised versions of the objects.
> >  
> >  
> > Seems like first option is simple adapter. Am I right?
> > If yes, I think it's a more efficient way comparing with transformation of
> > each object to some other(Kryo) format.
> >  
> > Can you provide some additional links for both options?
> > Where I can find API or(and) examples?
> >  
> > As a second step, we can apply same approach to the regular key, value
> > caches.
> >  
> > Feel free to create a ticket.
> >  
> > В Пт, 27/07/2018 в 09:37 +0100, Stuart Macdonald пишет:
> >  
> > Ignite Dev Community,
> >  
> >  
> > Within Ignite-supplied Spark DataFrames, I’d like to propose adding support
> >  
> > for _key and _val columns which represent the cache key and value objects
> >  
> > similar to the current _key/_val column semantics in Ignite SQL.
> >  
> >  
> > If the cache key or value objects are standard SQL types (eg. String, Int,
> >  
> > etc) they will be represented as such in the DataFrame schema, otherwise
> >  
> > they are represented as Binary types encoded as either: 1. Ignite
> >  
> > BinaryObjects, in which case we’d need to supply a Spark Encoder
> >  
> > implementation for BinaryObjects, or 2. Kryo-serialised versions of the
> >  
> > objects. Option 1 would probably be more efficient but option 2 would be
> >  
> > more idiomatic Spark.
> >  
> >  
> > This feature would be controlled with an optional parameter in the Ignite
> >  
> > data source, defaulting to the current implementation which doesn’t supply
> >  
> > _key or _val columns. The rationale behind this is the same as the Ignite
> >  
> > SQL _key and _val columns: to allow access to the full cache objects from a
> >  
> > SQL context.
> >  
> >  
> > Can I ask for feedback on this proposal please?
> >  
> >  
> > I’d be happy to contribute this feature if we agree on the concept.
> >  
> >  
> > Stuart.

Re: Spark DataFrames With Cache Key and Value Objects

Reply via email to