Re: Creating a SchemaRDD from an existing API

Michael Armbrust Fri, 28 Nov 2014 11:29:00 -0800

You probably don't need to create a new kind of SchemaRDD.  Instead I'd
suggest taking a look at the data sources API that we are adding in Spark
1.2.  There is not a ton of documentation, but the test cases show how to
implement the various interfaces
<https://github.com/apache/spark/tree/master/sql/core/src/test/scala/org/apache/spark/sql/sources>,
and there is an example library for reading Avro data
<https://github.com/databricks/spark-avro>.


On Thu, Nov 27, 2014 at 10:31 PM, Niranda Perera <nira...@wso2.com> wrote:

> Hi,
>
> I am evaluating Spark for an analytic component where we do batch
> processing of data using SQL.
>
> So, I am particularly interested in Spark SQL and in creating a SchemaRDD
> from an existing API [1].
>
> This API exposes elements in a database as datasources. Using the methods
> allowed by this data source, we can access and edit data.
>
> So, I want to create a custom SchemaRDD using the methods and provisions of
> this API. I tried going through Spark documentation and the Java Docs, but
> unfortunately, I was unable to come to a final conclusion if this was
> actually possible.
>
> I would like to ask the Spark Devs,
> 1. As of the current Spark release, can we make a custom SchemaRDD?
> 2. What is the extension point to a custom SchemaRDD? or are there
> particular interfaces?
> 3. Could you please point me the specific docs regarding this matter?
>
> Your help in this regard is highly appreciated.
>
> Cheers
>
> [1]
>
> https://github.com/wso2-dev/carbon-analytics/tree/master/components/xanalytics
>
> --
> *Niranda Perera*
> Software Engineer, WSO2 Inc.
> Mobile: +94-71-554-8430
> Twitter: @n1r44 <https://twitter.com/N1R44>
>

Re: Creating a SchemaRDD from an existing API

Reply via email to