Hi Rahul, I have improved the docs around configs here https://github.com/apache/incubator-hudi/pull/10
Please take a look and share thoughts.. @others, please review as well Thanks Vinoth On Fri, Mar 8, 2019 at 10:09 AM Vinoth Chandar <[email protected]> wrote: > Hi Rahul, > > You mentioned you are doing inserts alone correct? Then , there won't be > any log files and thus no compaction.. > is this what you are seeing > > Thanks > Vinoth > > On Fri, Mar 8, 2019 at 2:15 AM [email protected] < > [email protected]> wrote: > >> >> >> On 2019/03/08 10:12:18, [email protected] < >> [email protected]> wrote: >> > >> > >> > On 2019/03/08 10:05:16, [email protected] < >> [email protected]> wrote: >> > > >> > > >> > > On 2019/03/08 09:29:20, "[email protected]" <[email protected]> >> wrote: >> > > > >> > > > This warn messages might be misleading in this context. Looks like >> it it coming from Kafka Source. At very high-level DeltaStreamer uses a >> single properties object to configure all things and this error could be >> coming from a component that do not understand other configs. Can you let >> the delta-streamer run to completion. Are you seeing errors/exception >> stopping the job ? >> > > > On Friday, March 8, 2019, 12:05:50 AM PST, < >> [email protected]> wrote: >> > > > >> > > > >> > > > >> > > > On 2019/03/07 18:35:54, Vinoth Chandar <[email protected]> wrote: >> > > > > Hi Rahul, >> > > > > >> > > > > Can you please subscribe to the mailing list? Otherwise, each >> reply >> > > > > requires a moderator to approve before it can show up :) >> > > > > >> > > > > Thanks >> > > > > Vinoth >> > > > > >> > > > > On Thu, Mar 7, 2019 at 9:02 AM Balaji Varadarajan >> > > > > <[email protected]> wrote: >> > > > > >> > > > > > It depends on which mechanism you use : >> > > > > > 1. For Spark DataSource route, you can use the "options" API >> > > > > > of DataFrameWriter to pass in these configs. Here is an example >> from >> > > > > > http://hudi.apache.org/incremental_processing.html >> > > > > > inputDF.write() >> > > > > > .format("com.uber.hoodie") >> > > > > > .options(clientOpts) // any of the Hudi client opts can >> be passed >> > > > > > in as well >> > > > > > .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY(), >> > > > > > "_row_key") >> > > > > > >> .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY(), >> > > > > > "partition") >> > > > > > >> .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY(), >> > > > > > "timestamp") >> > > > > > .option(HoodieWriteConfig.TABLE_NAME, tableName) >> > > > > > .mode(SaveMode.Append) >> > > > > > .save(basePath); >> > > > > > 2. For an approach involving using HoodieWriteClient directly, >> you can >> > > > > > simply construct HoodieWriteConfig object with the configs in >> the link you >> > > > > > mentioned. >> > > > > > 3. When using HoodieDeltaStreamer tool to ingest, you can set >> the configs >> > > > > > in properties file and pass the file as the cmdline argument >> "--props" >> > > > > > >> > > > > > All the file-size configs must be in bytes denomination >> > > > > > Balaji.V >> > > > > > >> > > > > > >> > > > > > On Thursday, March 7, 2019, 7:00:55 AM PST, >> [email protected] < >> > > > > > [email protected]> wrote: >> > > > > > >> > > > > > Hi All >> > > > > > I have found hoodie related configurations in >> > > > > > http://hudi.apache.org/configurations.html. Please tell how >> we can pass >> > > > > > these configurations to the spark job. Also please tell for >> file size >> > > > > > related configs in which way i need to give the value for >> MB/GB/Bytes. >> > > > > > >> > > > > > Thanks & Regards >> > > > > > Rahul P >> > > > > > >> > > > > >> > > > >> > > > >> > > > >> > > > I tried testing the properties with --props options. I added some >> of the properties in the kafka.source.properties file and passed with >> --props option. But the configuration are not valid. I think it is not for >> the deletastreamer. Can you please give proper properties for the >> deltastreamer. My intention is inline compaction. >> > > > >> > > > WARN VerifiableProperties: Property OPERATION_OPT_KEY is not valid >> > > > INFO VerifiableProperties: Property auto.offset.reset is overridden >> to smallest >> > > > WARN VerifiableProperties: Property compactionSmallFileSize is not >> valid >> > > > WARN VerifiableProperties: Property hoodie.compact.inline is not >> valid >> > > > >> > > > >> > > > >> > > > Thanks & Regards >> > > > Rahul P >> > > > >> > > >> > > >> > > It is not compacting, and nothing related to compaction i am >> observing in spark job logs. Can you please tell in which java class the >> "hoodie.compact.inline" option's value is taking. >> > > >> > > >> > > >> > > >> > FYI >> > I am using below command and checking whether it is compaction while >> injecting data throgh deltastreamer >> > >> > spark-submit --class >> com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer >> hoodie-utilities-0.4.5.jar --storage-type MERGE_ON_READ --source-class >> --source-ordering-field evemt_time --target-base-path /MOR10 >> --target-table MOR10 --props /hudi/kafka-source.properties >> --schemaprovider-class >> com.uber.hoodie.utilities.schema.FilebasedSchemaProvider >> > >> ++Added missed source class argument >> >> I am using below command and checking whether it is compacting while >> injecting data throgh deltastreamer >> >> spark-submit --class >> com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer >> hoodie-utilities-0.4.5.jar --storage-type MERGE_ON_READ --source-class >> com.uber.hoodie.utilities.sources.JsonKafkaSource --source-ordering-field >> evemt_time --target-base-path /MOR10 --target-table MOR10 --props >> /hudi/kafka-source.properties --schemaprovider-class >> com.uber.hoodie.utilities.schema.FilebasedSchemaProvider >> >
