Re: How i can pass Hoodie configurations to spark job

rahuledavalath Fri, 08 Mar 2019 02:05:44 -0800

On 2019/03/08 09:29:20, "[email protected]" <[email protected]> wrote: 
>  
> This warn messages might be misleading in this context. Looks like it it 
> coming from Kafka Source. At very high-level DeltaStreamer uses a single 
> properties object to configure all things and this error could be coming from 
> a component that do not understand other configs. Can you let the 
> delta-streamer run to completion. Are you seeing errors/exception stopping 
> the job ?
>     On Friday, March 8, 2019, 12:05:50 AM PST, <[email protected]> 
> wrote:  
>  
>  
> 
> On 2019/03/07 18:35:54, Vinoth Chandar <[email protected]> wrote: 
> > Hi Rahul,
> > 
> > Can you please subscribe to the mailing list? Otherwise, each reply
> > requires a moderator to approve before it can show up :)
> > 
> > Thanks
> > Vinoth
> > 
> > On Thu, Mar 7, 2019 at 9:02 AM Balaji Varadarajan
> > <[email protected]> wrote:
> > 
> > >  It depends on which mechanism you use :
> > > 1. For Spark DataSource route, you can use the "options" API
> > > of DataFrameWriter to pass in these configs. Here is an example from
> > > http://hudi.apache.org/incremental_processing.html
> > > inputDF.write()
> > >        .format("com.uber.hoodie")
> > >        .options(clientOpts) // any of the Hudi client opts can be passed
> > > in as well
> > >        .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY(),
> > > "_row_key")
> > >        .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY(),
> > > "partition")
> > >        .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY(),
> > > "timestamp")
> > >        .option(HoodieWriteConfig.TABLE_NAME, tableName)
> > >        .mode(SaveMode.Append)
> > >        .save(basePath);
> > > 2. For an approach involving using HoodieWriteClient directly, you can
> > > simply construct HoodieWriteConfig object with the configs in the link you
> > > mentioned.
> > > 3. When using HoodieDeltaStreamer tool to ingest, you can set the configs
> > > in properties file and pass the file as the cmdline argument "--props"
> > >
> > > All the file-size configs must be in bytes denomination
> > > Balaji.V
> > >
> > >
> > >    On Thursday, March 7, 2019, 7:00:55 AM PST, [email protected] <
> > > [email protected]> wrote:
> > >
> > >  Hi All
> > > I have found hoodie related configurations in
> > > http://hudi.apache.org/configurations.html.  Please tell how we can pass
> > > these configurations to the spark job.  Also please tell for file size
> > > related configs in which way i need to give the value for MB/GB/Bytes.
> > >
> > > Thanks & Regards
> > > Rahul P
> > >
> > 
> 
> 
> 
> I tried testing the properties with --props options. I added some of the 
> properties in the kafka.source.properties file and passed with --props 
> option. But the configuration are not valid. I think it is not for the 
> deletastreamer.  Can you  please give proper properties for the 
> deltastreamer. My intention  is inline compaction.
> 
> WARN VerifiableProperties: Property OPERATION_OPT_KEY is not valid
> INFO VerifiableProperties: Property auto.offset.reset is overridden to 
> smallest
> WARN VerifiableProperties: Property compactionSmallFileSize is not valid
> WARN VerifiableProperties: Property hoodie.compact.inline is not valid
> 
> 
> 
> Thanks & Regards
> Rahul P
>   


It is not compacting,  and nothing related to compaction i am observing in 
spark job logs. Can you please tell in which java class the 
"hoodie.compact.inline"  option's value is taking.
Re: How i can pass Hoodie configurations to spark job

Reply via email to