For SQL compatibility and data extraction from Spark
https://github.com/julianhyde/optiq which just hit the Apache incubator
and is used in Hadoop / Drill and some other stuff has a Spark plugin.
Tom
On 22/05/14 23:24, Mattmann, Chris A (3980) wrote:
Hi Guys,
We're currently working at JPL on figuring out how we can make a
Shark/Spark
interface for Apache OODT which can be used for ETL and workflow
management:
http://oodt.apache.org/
OODT currently supports RDBMS, Solr/Lucene, and we are also working on a
Gora
plugin for it too.
Cheers,
Chris
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-5th floor
Email: [email protected]
WWW: http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-----Original Message-----
From: William Kang <[email protected]>
Reply-To: "[email protected]" <[email protected]>
Date: Thursday, May 22, 2014 7:49 AM
To: "[email protected]" <[email protected]>
Subject: ETL and workflow management on Spark
Hi,
We are moving into adopting the full stack of Spark. So far, we have used
Shark to do some ETL work, which is not bad but is not prefect either. We
ended writing UDF and UDGF, UDAF that can be avoided if we could use Pig.
Do you have any suggestions with the ETL solution in Spark stack?
And did any one have a working work flow management solution with Spark?
Many thanks.
Cao
--
*Tom Barber* | Technical Director
meteorite bi
*T:* +44 20 8133 3730
*W:* www.meteorite.bi | *Skype:* meteorite.consulting
*A:* Surrey Technology Centre, Surrey Research Park, Guildford, GU2 7YG, UK