+1 to Bhavani.

To be more specific, you need to set the following property in your config
file ->

hoodie.deltastreamer.source.dfs.root=hdfs://path/to/file

Command to load and save to hive simultaneously -

spark-submit --master local[1] --class
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer
/path_to_hudi-utilities-bundle-jar --storage-type COPY_ON_WRITE
--source-class com.uber.hoodie.utilities.sources.JsonDFSSource
--source-ordering-field hudi_event_ts  --target-base-path
hdfs:///tmp/hoodie/cow_table --target-table cow_table --props
hdfs:///tmp/hoodie/fg-kafka-source.properties --schemaprovider-class
org.apache.hudi.utilities.schema.FilebasedSchemaProvider --source-limit
5000 --enable-hive-sync

On Tue, Oct 22, 2019 at 11:56 AM Bhavani Sudha <[email protected]>
wrote:

> Hi Qian,
> Here are some useful links on using DeltaStreamer-
> https://hudi.apache.org/writing_data.html#deltastreamer, What are some
> ways
> to write a Hudi dataset
> <
> https://cwiki.apache.org/confluence/display/HUDI/FAQ#FAQ-WhataresomewaystowriteaHudidataset
> >
> , How can I now query the Hudi dataset I just wrote
> <
> https://cwiki.apache.org/confluence/display/HUDI/FAQ#FAQ-HowcanInowquerytheHudidatasetIjustwrote
> >
>
>
> Thanks,
> Sudha
>
> On Mon, Oct 21, 2019 at 8:09 PM Qian Wang <[email protected]> wrote:
>
> > Hi,
> >
> > As the DeltaStreamer can read DFS data and save as Hudi dataset, can
> > anyone give me the DeltaStreamer command example to load DFS data and
> save
> > to Hive? Thanks.
> >
> > Best,
> > Qian
> >
>

Reply via email to