I've submitted https://github.com/apache/druid/pull/9454 today to add a `OnHeapMemorySegmentWriteOutMediumFactory`.
On Mon, Mar 2, 2020 at 8:57 AM Oğuzhan Mangır <oguzhan.man...@trendyol.com> wrote: > > > On 2020/02/26 13:26:13, itai yaffe <itai.ya...@gmail.com> wrote: > > Hey, > > Per Gian's proposal, and following this thread in Druid user group ( > > https://groups.google.com/forum/#!topic/druid-user/FqAuDGc-rUM) and this > > thread in Druid Slack channel ( > > https://the-asf.slack.com/archives/CJ8D1JTB8/p1581452302483600), I'd > like > > to start discussing the options of having Spark-based ingestion into > Druid. > > > > There's already an old project ( > https://github.com/metamx/druid-spark-batch) > > for that, so perhaps we can use that as a starting point. > > > > The thread on Slack suggested 2 approaches: > > > > 1. *Simply replacing the Hadoop MapReduce ingestion task* - having a > > Spark batch job that ingests data into Druid, as a simple replacement > of > > the Hadoop MapReduce ingestion task. > > Meaning - your data pipeline will have a Spark job to pre-process the > > data (similar to what some of us have today), and another Spark job > to read > > the output of the previous job, and create Druid segments (again - > > following the same pattern as the Hadoop MapReduce ingestion task). > > 2. *Druid output sink for Spark* - rather than having 2 separate Spark > > jobs, 1 for pre-processing the data and 1 for ingesting the data into > > Druid, you'll have a single Spark job that pre-processes the data and > > creates Druid segments directly, e.g > sparkDataFrame.write.format("druid") > > (as suggested by omngr on Slack). > > > > > > I personally prefer the 2nd approach - while it might be harder to > > implement, it seems the benefits are greater in this approach. > > > > I'd like to hear your thoughts and to start getting this ball rolling. > > > > Thanks, > > Itai > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org > For additional commands, e-mail: dev-h...@druid.apache.org > >