+1 to Bhavani. To be more specific, you need to set the following property in your config file ->
hoodie.deltastreamer.source.dfs.root=hdfs://path/to/file Command to load and save to hive simultaneously - spark-submit --master local[1] --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer /path_to_hudi-utilities-bundle-jar --storage-type COPY_ON_WRITE --source-class com.uber.hoodie.utilities.sources.JsonDFSSource --source-ordering-field hudi_event_ts --target-base-path hdfs:///tmp/hoodie/cow_table --target-table cow_table --props hdfs:///tmp/hoodie/fg-kafka-source.properties --schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider --source-limit 5000 --enable-hive-sync On Tue, Oct 22, 2019 at 11:56 AM Bhavani Sudha <[email protected]> wrote: > Hi Qian, > Here are some useful links on using DeltaStreamer- > https://hudi.apache.org/writing_data.html#deltastreamer, What are some > ways > to write a Hudi dataset > < > https://cwiki.apache.org/confluence/display/HUDI/FAQ#FAQ-WhataresomewaystowriteaHudidataset > > > , How can I now query the Hudi dataset I just wrote > < > https://cwiki.apache.org/confluence/display/HUDI/FAQ#FAQ-HowcanInowquerytheHudidatasetIjustwrote > > > > > Thanks, > Sudha > > On Mon, Oct 21, 2019 at 8:09 PM Qian Wang <[email protected]> wrote: > > > Hi, > > > > As the DeltaStreamer can read DFS data and save as Hudi dataset, can > > anyone give me the DeltaStreamer command example to load DFS data and > save > > to Hive? Thanks. > > > > Best, > > Qian > > >
