Re: Questions about the future of UDTs and Encoders

Grandjean Patrick Sat, 18 Nov 2017 17:51:41 -0800

Hi Michael,

Having faced the same limitation, I have found these two libraries to be 
helpful:


- Frameless (https://github.com/typelevel/frameless 
<https://github.com/typelevel/frameless>)
- struct-type-encoder 
(https://benfradet.github.io/blog/2017/06/14/Deriving-Spark-Dataframe-schemas-with-Shapeless
 
<https://benfradet.github.io/blog/2017/06/14/Deriving-Spark-Dataframe-schemas-with-Shapeless>)

Both use Shapeless to derive Datasets.

I hope it helps.

Patrick.


> On Nov 14, 2017, at 20:38, mlopez <michael.lopez....@gmail.com> wrote:
> 
> Hello everyone!
> 
> I'm a developer at a security ratings company. We've been moving to Spark
> for our data analytics and nearly every dataset we have contains IP
> addresses or variable-length subnets. Katherine's descriptions of use cases
> and attempts to emulate networking types overlap with ours. I would add that
> we also need to write complex queries over subnets in addition to IP
> addresses.
> 
> Has there been any update on this topic?
> https://github.com/apache/spark/pull/16478 was last updated in February of
> this year.
> 
> I would also like to know if it would be better to work toward IP networking
> types. Supposing Spark had UDT support, would it be just as good as built-in
> support for networking types? Where would they fall short? Would it be
> possible to pass custom rules catalyst for optimizing expressions with
> networking types?
> 
> We have to write complex joins over predicates like subnet containment and
> have to resort to difficult to read tricks to ensure that Spark doesn't
> resort to an inefficient join strategy. For example, it would be great to
> simply write `df1.join(df2, contains($"src_net", $"dst_net")` to join
> records from one dataset that have subnets that are contained in another.
> 
> 
> 
> -----
> Michael Lopez
> Cheerful Engineer!
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>

Re: Questions about the future of UDTs and Encoders

Reply via email to