Yep I was looking into using the jar service loader.

I pushed a rough draft to my fork of Spark:
https://github.com/JDrit/spark/commit/946186e3f17ddcc54acf2be1a34aebf246b06d2f

Right now it will use the first alias it finds, but I can change that to
check them all and report an error if it finds duplicate aliases. I tested
this locally with the spark-avro package and it allows me to use "avro" as
the format specified. It defaults to using the class name, just like you
would if you did not alias anything.


On Thu, Jul 30, 2015 at 11:20 AM Michael Armbrust <mich...@databricks.com>
wrote:

> +1
>
> On Thu, Jul 30, 2015 at 11:18 AM, Patrick Wendell <pwend...@gmail.com>
> wrote:
>
>> Yeah this could make sense - allowing data sources to register a short
>> name. What mechanism did you have in mind? To use the jar service loader?
>>
>> The only issue is that there could be conflicts since many of these are
>> third party packages. If the same name were registered twice I'm not sure
>> what the best behavior would be. Ideally in my mind if the same shortname
>> were registered twice we'd force the user to use a fully qualified name and
>> say the short name is ambiguous.
>>
>> Patrick
>> On Jul 30, 2015 9:44 AM, "Joseph Batchik" <josephbatc...@gmail.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> There are now starting to be a lot of data source packages for Spark. A
>>> annoyance I see is that I have to type in the full class name like:
>>>
>>> sqlContext.read.format("com.databricks.spark.avro").load(path).
>>>
>>> Spark internally has formats such as "parquet" and "jdbc" registered and
>>> it would be nice to be able just to type in "avro", "redshift", etc. as
>>> well. Would it be a good idea to use something like a service loader to
>>> allow data sources defined in other packages to register themselves with
>>> Spark? I think that this would make it easier for end users. I would be
>>> interested in adding this, please let me know what you guys think.
>>>
>>> - Joe
>>>
>>>
>>>
>

Reply via email to