[GitHub] [druid] b-slim commented on issue #9780: Support directly reading and writing Druid data from Spark

GitBox Wed, 24 Feb 2021 10:50:49 -0800


b-slim commented on issue #9780:
URL: https://github.com/apache/druid/issues/9780#issuecomment-785295359



   @JulianJaffePinterest Let me start by saying this is great work and would 
love to share with you some of the experience we had when building a similar 
connector between Hive and Druid. 
   The main point I want to share with you is why we picked the Hive Repo as 
the best place for such a connector.
    - The major APIs used of the integration are the Hive Splits and Partitions 
I think In your case it is DatasourceV2 thus by having the code within Hive 
Repo we had a better compile-time dependency check and faster iteration (no 
need to public release to get things working with the bleeding-edge master) 
while most of the Druid APIs are RestFull and easy to get around without public 
releases of Druid.
    - From a testing perspective we found it is way simple to run the tests 
within the Hive Integration test since most of the work and heavy lifting is 
done by Hive (in your case will be spark) again this was a big win for us.
    - From a feature set perspective we found ourselves doing some extra work 
in Hive Partition logic or Planning logic to enable some bloom filter pushdown 
or well-balanced druid segments and again by having the connector leaving 
within Hive repo made things way simpler.
   
   Those are the 2 cents that I want to share and should be considered as 
opinion and not review 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] b-slim commented on issue #9780: Support directly reading and writing Druid data from Spark

Reply via email to