You probably don't need to create a new kind of SchemaRDD. Instead I'd suggest taking a look at the data sources API that we are adding in Spark 1.2. There is not a ton of documentation, but the test cases show how to implement the various interfaces <https://github.com/apache/spark/tree/master/sql/core/src/test/scala/org/apache/spark/sql/sources>, and there is an example library for reading Avro data <https://github.com/databricks/spark-avro>.
On Thu, Nov 27, 2014 at 10:31 PM, Niranda Perera <nira...@wso2.com> wrote: > Hi, > > I am evaluating Spark for an analytic component where we do batch > processing of data using SQL. > > So, I am particularly interested in Spark SQL and in creating a SchemaRDD > from an existing API [1]. > > This API exposes elements in a database as datasources. Using the methods > allowed by this data source, we can access and edit data. > > So, I want to create a custom SchemaRDD using the methods and provisions of > this API. I tried going through Spark documentation and the Java Docs, but > unfortunately, I was unable to come to a final conclusion if this was > actually possible. > > I would like to ask the Spark Devs, > 1. As of the current Spark release, can we make a custom SchemaRDD? > 2. What is the extension point to a custom SchemaRDD? or are there > particular interfaces? > 3. Could you please point me the specific docs regarding this matter? > > Your help in this regard is highly appreciated. > > Cheers > > [1] > > https://github.com/wso2-dev/carbon-analytics/tree/master/components/xanalytics > > -- > *Niranda Perera* > Software Engineer, WSO2 Inc. > Mobile: +94-71-554-8430 > Twitter: @n1r44 <https://twitter.com/N1R44> >