On 2019/03/08 19:25:25, Vinoth Chandar <[email protected]> wrote: 
> Hi Rahul,
> 
> I have improved the docs around configs here
> https://github.com/apache/incubator-hudi/pull/10
> 
> Please take a look and share thoughts..
> @others, please review as well
> 
> Thanks
> Vinoth
> 
> On Fri, Mar 8, 2019 at 10:09 AM Vinoth Chandar <[email protected]> wrote:
> 
> > Hi Rahul,
> >
> > You mentioned you are doing inserts alone correct?  Then , there won't be
> > any log files and thus no compaction..
> > is this what you are seeing
> >
> > Thanks
> > Vinoth
> >
> > On Fri, Mar 8, 2019 at 2:15 AM [email protected] <
> > [email protected]> wrote:
> >
> >>
> >>
> >> On 2019/03/08 10:12:18, [email protected] <
> >> [email protected]> wrote:
> >> >
> >> >
> >> > On 2019/03/08 10:05:16, [email protected] <
> >> [email protected]> wrote:
> >> > >
> >> > >
> >> > > On 2019/03/08 09:29:20, "[email protected]" <[email protected]>
> >> wrote:
> >> > > >
> >> > > > This warn messages might be misleading in this context. Looks like
> >> it it coming from Kafka Source. At very high-level DeltaStreamer uses a
> >> single properties object to configure all things and this error could be
> >> coming from a component that do not understand other configs. Can you let
> >> the delta-streamer run to completion. Are you seeing errors/exception
> >> stopping the job ?
> >> > > >     On Friday, March 8, 2019, 12:05:50 AM PST, <
> >> [email protected]> wrote:
> >> > > >
> >> > > >
> >> > > >
> >> > > > On 2019/03/07 18:35:54, Vinoth Chandar <[email protected]> wrote:
> >> > > > > Hi Rahul,
> >> > > > >
> >> > > > > Can you please subscribe to the mailing list? Otherwise, each
> >> reply
> >> > > > > requires a moderator to approve before it can show up :)
> >> > > > >
> >> > > > > Thanks
> >> > > > > Vinoth
> >> > > > >
> >> > > > > On Thu, Mar 7, 2019 at 9:02 AM Balaji Varadarajan
> >> > > > > <[email protected]> wrote:
> >> > > > >
> >> > > > > >  It depends on which mechanism you use :
> >> > > > > > 1. For Spark DataSource route, you can use the "options" API
> >> > > > > > of DataFrameWriter to pass in these configs. Here is an example
> >> from
> >> > > > > > http://hudi.apache.org/incremental_processing.html
> >> > > > > > inputDF.write()
> >> > > > > >        .format("com.uber.hoodie")
> >> > > > > >        .options(clientOpts) // any of the Hudi client opts can
> >> be passed
> >> > > > > > in as well
> >> > > > > >        .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY(),
> >> > > > > > "_row_key")
> >> > > > > >
> >> .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY(),
> >> > > > > > "partition")
> >> > > > > >
> >> .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY(),
> >> > > > > > "timestamp")
> >> > > > > >        .option(HoodieWriteConfig.TABLE_NAME, tableName)
> >> > > > > >        .mode(SaveMode.Append)
> >> > > > > >        .save(basePath);
> >> > > > > > 2. For an approach involving using HoodieWriteClient directly,
> >> you can
> >> > > > > > simply construct HoodieWriteConfig object with the configs in
> >> the link you
> >> > > > > > mentioned.
> >> > > > > > 3. When using HoodieDeltaStreamer tool to ingest, you can set
> >> the configs
> >> > > > > > in properties file and pass the file as the cmdline argument
> >> "--props"
> >> > > > > >
> >> > > > > > All the file-size configs must be in bytes denomination
> >> > > > > > Balaji.V
> >> > > > > >
> >> > > > > >
> >> > > > > >    On Thursday, March 7, 2019, 7:00:55 AM PST,
> >> [email protected] <
> >> > > > > > [email protected]> wrote:
> >> > > > > >
> >> > > > > >  Hi All
> >> > > > > > I have found hoodie related configurations in
> >> > > > > > http://hudi.apache.org/configurations.html.  Please tell how
> >> we can pass
> >> > > > > > these configurations to the spark job.  Also please tell for
> >> file size
> >> > > > > > related configs in which way i need to give the value for
> >> MB/GB/Bytes.
> >> > > > > >
> >> > > > > > Thanks & Regards
> >> > > > > > Rahul P
> >> > > > > >
> >> > > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > > I tried testing the properties with --props options. I added some
> >> of the properties in the kafka.source.properties file and passed with
> >> --props option. But the configuration are not valid. I think it is not for
> >> the deletastreamer.  Can you  please give proper properties for the
> >> deltastreamer. My intention  is inline compaction.
> >> > > >
> >> > > > WARN VerifiableProperties: Property OPERATION_OPT_KEY is not valid
> >> > > > INFO VerifiableProperties: Property auto.offset.reset is overridden
> >> to smallest
> >> > > > WARN VerifiableProperties: Property compactionSmallFileSize is not
> >> valid
> >> > > > WARN VerifiableProperties: Property hoodie.compact.inline is not
> >> valid
> >> > > >
> >> > > >
> >> > > >
> >> > > > Thanks & Regards
> >> > > > Rahul P
> >> > > >
> >> > >
> >> > >
> >> > > It is not compacting,  and nothing related to compaction i am
> >> observing in spark job logs. Can you please tell in which java class the
> >> "hoodie.compact.inline"  option's value is taking.
> >> > >
> >> > >
> >> > >
> >> > >
> >> > FYI
> >> > I am using below command and checking whether it is compaction while
> >> injecting data throgh deltastreamer
> >> >
> >> > spark-submit --class
> >> com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer
> >> hoodie-utilities-0.4.5.jar --storage-type MERGE_ON_READ --source-class
> >>  --source-ordering-field evemt_time   --target-base-path /MOR10
> >> --target-table MOR10 --props /hudi/kafka-source.properties
> >>  --schemaprovider-class
> >> com.uber.hoodie.utilities.schema.FilebasedSchemaProvider
> >> >
> >> ++Added missed source class argument
> >>
> >> I am using below command and checking whether it is compacting while
> >> injecting data throgh deltastreamer
> >>
> >>  spark-submit --class
> >> com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer
> >> hoodie-utilities-0.4.5.jar --storage-type MERGE_ON_READ --source-class
> >>  com.uber.hoodie.utilities.sources.JsonKafkaSource  --source-ordering-field
> >> evemt_time   --target-base-path /MOR10 --target-table MOR10 --props
> >> /hudi/kafka-source.properties   --schemaprovider-class
> >> com.uber.hoodie.utilities.schema.FilebasedSchemaProvider
> >>
> >
> Dear Vinoth

      Thanks for the improving configuration related docs. It's more clear now

Thanks & Regards
Rahul 

Reply via email to