On 2019/03/08 10:12:18, [email protected] <[email protected]>
wrote:
>
>
> On 2019/03/08 10:05:16, [email protected] <[email protected]>
> wrote:
> >
> >
> > On 2019/03/08 09:29:20, "[email protected]" <[email protected]> wrote:
> > >
> > > This warn messages might be misleading in this context. Looks like it it
> > > coming from Kafka Source. At very high-level DeltaStreamer uses a single
> > > properties object to configure all things and this error could be coming
> > > from a component that do not understand other configs. Can you let the
> > > delta-streamer run to completion. Are you seeing errors/exception
> > > stopping the job ?
> > > On Friday, March 8, 2019, 12:05:50 AM PST, <[email protected]>
> > > wrote:
> > >
> > >
> > >
> > > On 2019/03/07 18:35:54, Vinoth Chandar <[email protected]> wrote:
> > > > Hi Rahul,
> > > >
> > > > Can you please subscribe to the mailing list? Otherwise, each reply
> > > > requires a moderator to approve before it can show up :)
> > > >
> > > > Thanks
> > > > Vinoth
> > > >
> > > > On Thu, Mar 7, 2019 at 9:02 AM Balaji Varadarajan
> > > > <[email protected]> wrote:
> > > >
> > > > > It depends on which mechanism you use :
> > > > > 1. For Spark DataSource route, you can use the "options" API
> > > > > of DataFrameWriter to pass in these configs. Here is an example from
> > > > > http://hudi.apache.org/incremental_processing.html
> > > > > inputDF.write()
> > > > > .format("com.uber.hoodie")
> > > > > .options(clientOpts) // any of the Hudi client opts can be
> > > > >passed
> > > > > in as well
> > > > > .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY(),
> > > > > "_row_key")
> > > > > .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY(),
> > > > > "partition")
> > > > > .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY(),
> > > > > "timestamp")
> > > > > .option(HoodieWriteConfig.TABLE_NAME, tableName)
> > > > > .mode(SaveMode.Append)
> > > > > .save(basePath);
> > > > > 2. For an approach involving using HoodieWriteClient directly, you can
> > > > > simply construct HoodieWriteConfig object with the configs in the
> > > > > link you
> > > > > mentioned.
> > > > > 3. When using HoodieDeltaStreamer tool to ingest, you can set the
> > > > > configs
> > > > > in properties file and pass the file as the cmdline argument "--props"
> > > > >
> > > > > All the file-size configs must be in bytes denomination
> > > > > Balaji.V
> > > > >
> > > > >
> > > > > On Thursday, March 7, 2019, 7:00:55 AM PST,
> > > > >[email protected] <
> > > > > [email protected]> wrote:
> > > > >
> > > > > Hi All
> > > > > I have found hoodie related configurations in
> > > > > http://hudi.apache.org/configurations.html. Please tell how we can
> > > > > pass
> > > > > these configurations to the spark job. Also please tell for file size
> > > > > related configs in which way i need to give the value for MB/GB/Bytes.
> > > > >
> > > > > Thanks & Regards
> > > > > Rahul P
> > > > >
> > > >
> > >
> > >
> > >
> > > I tried testing the properties with --props options. I added some of the
> > > properties in the kafka.source.properties file and passed with --props
> > > option. But the configuration are not valid. I think it is not for the
> > > deletastreamer. Can you please give proper properties for the
> > > deltastreamer. My intention is inline compaction.
> > >
> > > WARN VerifiableProperties: Property OPERATION_OPT_KEY is not valid
> > > INFO VerifiableProperties: Property auto.offset.reset is overridden to
> > > smallest
> > > WARN VerifiableProperties: Property compactionSmallFileSize is not valid
> > > WARN VerifiableProperties: Property hoodie.compact.inline is not valid
> > >
> > >
> > >
> > > Thanks & Regards
> > > Rahul P
> > >
> >
> >
> > It is not compacting, and nothing related to compaction i am observing in
> > spark job logs. Can you please tell in which java class the
> > "hoodie.compact.inline" option's value is taking.
> >
> >
> >
> >
> FYI
> I am using below command and checking whether it is compaction while
> injecting data throgh deltastreamer
>
> spark-submit --class
> com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer
> hoodie-utilities-0.4.5.jar --storage-type MERGE_ON_READ --source-class
> --source-ordering-field evemt_time --target-base-path /MOR10 --target-table
> MOR10 --props /hudi/kafka-source.properties --schemaprovider-class
> com.uber.hoodie.utilities.schema.FilebasedSchemaProvider
>
++Added missed source class argument
I am using below command and checking whether it is compacting while injecting
data throgh deltastreamer
spark-submit --class
com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer
hoodie-utilities-0.4.5.jar --storage-type MERGE_ON_READ --source-class
com.uber.hoodie.utilities.sources.JsonKafkaSource --source-ordering-field
evemt_time --target-base-path /MOR10 --target-table MOR10 --props
/hudi/kafka-source.properties --schemaprovider-class
com.uber.hoodie.utilities.schema.FilebasedSchemaProvider