Hi, Are you saying these classes needs to change? If so, understood. But are you planning on renaming configs or relocating them? We dont want existing users to do any upgrading or reconfigurations
On Fri, Dec 13, 2019 at 10:28 AM lamberken <[email protected]> wrote: > > > Hi, > > > They need to change due to this, because only HoodieWriteConfig and > *Options will be kept. > > > best, > lamber-ken > > > At 2019-12-14 01:23:35, "Vinoth Chandar" <[email protected]> wrote: > >Hi, > > > >We are trying to understand if existing jobs (datasource, deltastreamer, > >anything else) needs to change due to this. > > > >On Wed, Dec 11, 2019 at 7:18 PM lamberken <[email protected]> wrote: > > > >> > >> > >> Hi, @vinoth > >> > >> > >> 1, Hoodie*Config classes are only used to set default value when call > >> their build method currently. > >> They will be replaced by HoodieMemoryOptions, HoodieIndexOptions, > >> HoodieHBaseIndexOptions, etc... > >> 2, I don't understand the question "It is not clear to me whether there > is > >> any external facing changes which changes this model.". > >> > >> > >> Best, > >> lamber-ken > >> > >> > >> At 2019-12-12 11:01:36, "Vinoth Chandar" <[email protected]> wrote: > >> >I actually prefer the builder pattern for making the configs, because I > >> can > >> >do `builder.` in the IDE and actually see all the options... That said, > >> >most developers program against the Spark datasource and so this may > not > >> be > >> >useful, unless we expose a builder for that.. I will concede that since > >> its > >> >also subjective anyway. > >> > > >> >But, to clarify Siva's question, you do intend to keep the different > >> >component level config classes right - HoodieIndexConfig, > >> >HoodieCompactionConfig? > >> > > >> >Once again, can you please explicitly address the following question, > so > >> we > >> >can get on the same page? > >> >>> It is not clear to me whether there is any external facing changes > >> which > >> >changes this model. > >> >This is still the most critical question from both me and balaji. > >> > > >> >On Wed, Dec 11, 2019 at 11:35 AM lamberken <[email protected]> wrote: > >> > > >> >> hi, @Sivabalan > >> >> > >> >> Yes, thanks very much for help me explain my initial proposal. > >> >> > >> >> > >> >> Answer your question, we can call HoodieWriteConfig as a > SystemConfig, > >> we > >> >> need to pass it everywhere. > >> >> Actually, it may just contains a few custom configurations( does not > >> >> include default configurations) > >> >> Because each component has its own ConfigOptions. > >> >> > >> >> > >> >> The old version HoodieWriteConfig includes all keys(custom > >> configurations, > >> >> default configurations), it is a fat. > >> >> > >> >> > >> >> Best, > >> >> lamber-ken > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> At 2019-12-12 03:14:11, "Sivabalan" <[email protected]> wrote: > >> >> >Let me summarize your initial proposal and then will get into > details. > >> >> >- Introduce ConfigOptions for ease of handling of default values. > >> >> >- Remove all Hoodie*Config classes and just have HoodieWriteConfig. > >> What > >> >> >this means is that, every other config file will be replaced by > >> >> >ConfigOptions. eg, HoodieIndexConfigOption, > >> HoodieCompactionConfigOption, > >> >> >etc. > >> >> >- Config option will take care of returning defaults for any > property, > >> >> even > >> >> >if an entire Config(eg IndexConfig) is not explicitly set. > >> >> > > >> >> >Here are the positives I see. > >> >> >- By way of having component level ConfigOptions, we bucketize the > >> configs > >> >> >and have defaults set(same as before) > >> >> >- User doesn't need to set each component's config(eg IndexConfig) > >> >> >explicitly with HoodieWriteConfig. > >> >> > > >> >> >But have one question: > >> >> >- I see Bucketizing only in write path. How does one get hold of > >> >> >IndexConfigOptions as a consumer? For eg, If some class is using > just > >> >> >IndexConfig alone, how will it consume? From your eg, I see only > >> >> >HoodieWriteConfig. Do we pass in HoodieWriteConfig everywhere then? > >> >> >Wouldn't that contradicts your initial proposal to not have a fat > >> config > >> >> >class? May be can you expand your example below to show how a > consumer > >> of > >> >> >IndexConfig look like. > >> >> > > >> >> >Your eg: > >> >> >/** > >> >> > * New version > >> >> > */ > >> >> >// set value overrite the default value > >> >> >HoodieWriteConfig config = new HoodieWriteConfig(); > >> >> >config.set(HoodieIndexConfigOptions.INDEX_TYPE, > >> >> >HoodieIndex.IndexType.HBASE.name < > >> >> http://hoodieindex.indextype.hbase.name/> > >> >> >()) > >> >> > > >> >> > > >> >> > > >> >> > > >> >> >On Wed, Dec 11, 2019 at 8:33 AM lamberken <[email protected]> > wrote: > >> >> > > >> >> >> > >> >> >> > >> >> >> Hi, > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> On 1,2. Yes, you are right, moving the getter to the component > level > >> >> >> Config class itself. > >> >> >> > >> >> >> > >> >> >> On 3, HoodieWriteConfig can also set value through ConfigOption, > >> small > >> >> >> code snippets. > >> >> >> From the bellow snippets, we can see that clients need to know > each > >> >> >> component's builders > >> >> >> and also call their "with" methods to override the default value > in > >> old > >> >> >> version. > >> >> >> > >> >> >> > >> >> >> But, in new version, clients just need to know each component's > >> public > >> >> >> config options, just like constants. > >> >> >> So, these builders are redundant. > >> >> >> > >> >> >> > >> >> >> > >> >> > >> > /---------------------------------------------------------------------------/ > >> >> >> > >> >> >> > >> >> >> public class HoodieIndexConfigOptions { > >> >> >> public static final ConfigOption<String> INDEX_TYPE = > ConfigOption > >> >> >> .key("hoodie.index.type") > >> >> >> .defaultValue(HoodieIndex.IndexType.BLOOM.name()); > >> >> >> } > >> >> >> > >> >> >> > >> >> >> public class HoodieWriteConfig { > >> >> >> public void setString(ConfigOption<String> option, String > value) { > >> >> >> this.props.put(option.key(), value); > >> >> >> } > >> >> >> } > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> /** > >> >> >> * New version > >> >> >> */ > >> >> >> // set value overrite the default value > >> >> >> HoodieWriteConfig config = new HoodieWriteConfig(); > >> >> >> config.set(HoodieIndexConfigOptions.INDEX_TYPE, > >> >> >> HoodieIndex.IndexType.HBASE.name()) > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> /** > >> >> >> * Old version > >> >> >> */ > >> >> >> HoodieWriteConfig.Builder builder = HoodieWriteConfig.newBuilder() > >> >> >> > >> >> >> > >> >> > >> > builder.withIndexConfig(HoodieIndexConfig.newBuilder().withIndexType(HoodieIndex.IndexType.BLOOM).build()) > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> > >> > /---------------------------------------------------------------------------/ > >> >> >> > >> >> >> > >> >> >> Another, users use hudi like bellow, here're all keys. > >> >> >> > >> >> >> > >> >> > >> > /---------------------------------------------------------------------------/ > >> >> >> > >> >> >> > >> >> >> df.write.format("hudi"). > >> >> >> option("hoodie.insert.shuffle.parallelism", "10"). > >> >> >> option("hoodie.upsert.shuffle.parallelism", "10"). > >> >> >> option("hoodie.delete.shuffle.parallelism", "10"). > >> >> >> option("hoodie.bulkinsert.shuffle.parallelism", "10"). > >> >> >> option("hoodie.datasource.write.recordkey.field", "name"). > >> >> >> option("hoodie.datasource.write.partitionpath.field", > >> "location"). > >> >> >> option("hoodie.datasource.write.precombine.field", "ts"). > >> >> >> option("hoodie.table.name", tableName). > >> >> >> mode(Overwrite). > >> >> >> save(basePath); > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> > >> > /---------------------------------------------------------------------------/ > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> Last, as I responsed to @vino, it's reasonable to handle > >> fallbackkeys. I > >> >> >> think we need to do this step by step, > >> >> >> it's easy to integrate FallbackKey in the future, it is not what > we > >> need > >> >> >> right now in my opinion. > >> >> >> > >> >> >> > >> >> >> If some places are still not very clear, feel free to feedback. > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> Best, > >> >> >> lamber-ken > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> At 2019-12-11 23:41:31, "Vinoth Chandar" <[email protected]> > wrote: > >> >> >> >Hi Lamber-ken, > >> >> >> > > >> >> >> >I looked at the sample PR you put up as well. > >> >> >> > > >> >> >> >On 1,2 => Seems your intent is to replace these with moving the > >> getter > >> >> to > >> >> >> >the component level Config class itself? I am fine with that > >> (although > >> >> I > >> >> >> >think its not that big of a hurdle really to use atm). But, once > we > >> do > >> >> >> that > >> >> >> >we could pass just the specific component config into parts of > code > >> >> versus > >> >> >> >passing in the entire HoodieWriteConfig object. I am fine with > >> moving > >> >> the > >> >> >> >classes to a ConfigOption class as you suggested as well. > >> >> >> > > >> >> >> >On 3, I still we feel we will need the builder pattern going > >> forward. > >> >> to > >> >> >> >build the HoodieWriteConfig object. Like below? Cannot understand > >> why > >> >> we > >> >> >> >would want to change this. Could you please clarify? > >> >> >> > > >> >> >> >HoodieWriteConfig.Builder builder = > >> >> >> > > >> >> >> > >> >> > >> > HoodieWriteConfig.newBuilder().withPath(cfg.targetBasePath).combineInput(cfg.filterDupes, > >> >> >> >true) > >> >> >> > > >> >> >> > >> >> > >> > .withCompactionConfig(HoodieCompactionConfig.newBuilder().withPayloadClass(cfg.payloadClassName) > >> >> >> > // Inline compaction is disabled for continuous mode. > >> >> >> >otherwise enabled for MOR > >> >> >> > > >> >> >> .withInlineCompaction(cfg.isInlineCompactionEnabled()).build()) > >> >> >> > .forTable(cfg.targetTableName) > >> >> >> > > >> >> >> > >> >> > >> > .withIndexConfig(HoodieIndexConfig.newBuilder().withIndexType(HoodieIndex.IndexType.BLOOM).build()) > >> >> >> > .withAutoCommit(false).withProps(props); > >> >> >> > > >> >> >> > > >> >> >> >Typically, we write RFCs for large changes that breaks existing > >> >> behavior > >> >> >> or > >> >> >> >introduces significantly complex new features.. If you are just > >> >> planning > >> >> >> to > >> >> >> >do the refactoring into ConfigOption class, per se you don't > need a > >> >> RFC. > >> >> >> >But , if you plan to address the fallback keys (or) your changes > are > >> >> going > >> >> >> >to break/change existing jobs, we would need a RFC. > >> >> >> > > >> >> >> >>> It is not clear to me whether there is any external facing > >> changes > >> >> >> which > >> >> >> >changes this model. > >> >> >> >I am still unclear on this as well. can you please explicitly > >> clarify? > >> >> >> > > >> >> >> >thanks > >> >> >> >vinoth > >> >> >> > > >> >> >> > > >> >> >> >On Tue, Dec 10, 2019 at 12:35 PM lamberken <[email protected]> > >> wrote: > >> >> >> > > >> >> >> >> > >> >> >> >> Hi, @Balaji @Vinoth > >> >> >> >> > >> >> >> >> > >> >> >> >> I'm sorry, some places are not very clear, > >> >> >> >> > >> >> >> >> > >> >> >> >> 1, We can see that HoodieMetricsConfig, HoodieStorageConfig, > etc.. > >> >> >> already > >> >> >> >> defined in project. > >> >> >> >> But we get property value by methods which defined in > >> >> >> >> HoodieWriteConfig, like > HoodieWriteConfig#getParquetMaxFileSize, > >> >> >> >> HoodieWriteConfig#getParquetBlockSize, etc. It's means that > >> >> >> >> Hoodie*Config are redundant. > >> >> >> >> > >> >> >> >> > >> >> >> >> 2, These Hoodie*Config classes are used to set default value > when > >> >> call > >> >> >> >> their build method, nothing else. > >> >> >> >> > >> >> >> >> > >> >> >> >> 3, For current plan is keep the Builder pattern when > configuring, > >> >> when > >> >> >> we > >> >> >> >> are familiar with the config framework, > >> >> >> >> We will find that Hoodie*Config class are redundant and > methods > >> >> >> >> prefixed with "get" in HoodieWriteConfig are also redundant. > >> >> >> >> > >> >> >> >> > >> >> >> >> In addition, I create a pr[1] for initializing with a demo. At > >> this > >> >> >> demo, > >> >> >> >> I create > >> >> >> >> MetricsGraphiteReporterOptions which contains HOST, PORT, > PREFIX, > >> and > >> >> >> >> remove getGraphiteServerHost, > >> >> >> >> getGraphiteServerPort, getGraphiteMetricPrefix in > >> >> HoodieMetricsConfig. > >> >> >> >> > >> >> >> >> > >> >> >> >> https://github.com/apache/incubator-hudi/pull/1094 > >> >> >> >> > >> >> >> >> > >> >> >> >> Best, > >> >> >> >> lamber-ken > >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> At 2019-12-11 02:35:30, "Balaji Varadarajan" > >> >> <[email protected] > >> >> >> > > >> >> >> >> wrote: > >> >> >> >> > Hi Lamber-Ken, > >> >> >> >> >Thanks for the time writing the proposal and thinking about > >> >> improving > >> >> >> >> Hudi usability. > >> >> >> >> >My preference would be to keep the Builder pattern when > >> >> configuring. It > >> >> >> >> is something I find it natural when configuring. It is not > clear > >> to > >> >> me > >> >> >> >> whether there is any external facing changes which changes this > >> >> model. > >> >> >> >> Would you mind adding some more details on the RFC. It would > save > >> >> time > >> >> >> to > >> >> >> >> read it in one place as opposed to checking out github repo :) > >> >> >> >> >Thanks,Balaji.V > >> >> >> >> > On Tuesday, December 10, 2019, 07:55:01 AM PST, Vinoth > >> Chandar < > >> >> >> >> [email protected]> wrote: > >> >> >> >> > > >> >> >> >> > Hi , > >> >> >> >> > > >> >> >> >> >Thanks for the proposal. Some parts I agree, some parts I > don't > >> and > >> >> >> some > >> >> >> >> >parts are unclear > >> >> >> >> > > >> >> >> >> >Agree : > >> >> >> >> >- On introducing a class that binds key, default value, > provided > >> >> value, > >> >> >> >> and > >> >> >> >> >also may be a doc along with it (?). > >> >> >> >> >- Designing the framework to have fallback keys is good IMO. > It > >> >> helps > >> >> >> us > >> >> >> >> do > >> >> >> >> >things like https://issues.apache.org/jira/browse/HUDI-89 > >> >> >> >> > > >> >> >> >> >Disagree : > >> >> >> >> >- Not all configuration values are in HoodieWriteConfig, its > not > >> >> >> accurate. > >> >> >> >> >Configs are already split by components into > HoodieIndexConfig, > >> >> >> >> >HoodieCompactionConfig etc.. > >> >> >> >> >- There are helpers for all these conveniently located in > >> >> >> >> >HoodieWriteConfig. I think some of the claims of usability > seem > >> >> >> subjective > >> >> >> >> >to me, speaking from hands-on experience writing jobs. So, if > you > >> >> >> >> proposing > >> >> >> >> >a large shake up (e.g not have a single properties file load > all > >> >> >> >> >components), I would love to understand this at more depth. > From > >> my > >> >> >> >> >experience, well namespaced configs in a single properties > file > >> >> keeps > >> >> >> it > >> >> >> >> >simple and understandable. > >> >> >> >> > > >> >> >> >> >Unclear : > >> >> >> >> >- What is impact on existing jobs - using RDD/WriteClient > API, > >> >> >> >> DataSource, > >> >> >> >> >DeltaStreamer level? Do you intend to change namespacing of > >> configs? > >> >> >> >> > > >> >> >> >> > > >> >> >> >> >Thanks > >> >> >> >> >Vinoth > >> >> >> >> > > >> >> >> >> >On Tue, Dec 10, 2019 at 6:44 AM lamberken <[email protected]> > >> >> wrote: > >> >> >> >> > > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> Hi, vino > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> Reasonable, we can refactor this step by step. The first > step > >> now > >> >> >> is to > >> >> >> >> >> introduce the config framework. > >> >> >> >> >> When our community is familiar with the config framework > >> >> mechanism, > >> >> >> it's > >> >> >> >> >> easy to integrate FallbackKey in the future. > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> Best, > >> >> >> >> >> lamber-ken > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> At 2019-12-10 11:51:22, "vino yang" <[email protected]> > >> >> wrote: > >> >> >> >> >> >Hi Lamber, > >> >> >> >> >> > > >> >> >> >> >> >Thanks for the proposal. +1 from my side. > >> >> >> >> >> > > >> >> >> >> >> >When it comes to configuration, it will involve how we > handle > >> >> >> >> deprecated > >> >> >> >> >> >configuration items in the future. In my opinion, we need > to > >> take > >> >> >> this > >> >> >> >> >> into > >> >> >> >> >> >consideration when designing. There are already some > >> successful > >> >> >> >> practices > >> >> >> >> >> >for our reference. For example, Flink defines some > deprecated > >> >> >> >> >> >configurations as FallbackKey[1]. Maybe we can learn from > >> these > >> >> >> >> designs. > >> >> >> >> >> > > >> >> >> >> >> >WDYT? > >> >> >> >> >> > > >> >> >> >> >> >[1]: > >> >> >> >> >> > > >> >> >> >> >> > >> >> >> >> > >> >> >> > >> >> > >> > https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/configuration/FallbackKey.java > >> >> >> >> >> > > >> >> >> >> >> >Best, > >> >> >> >> >> >Vino > >> >> >> >> >> > > >> >> >> >> >> >lamberken <[email protected]> 于2019年12月9日周一 下午11:19写道: > >> >> >> >> >> > > >> >> >> >> >> >> > >> >> >> >> >> >> > >> >> >> >> >> >> Hi, all > >> >> >> >> >> >> > >> >> >> >> >> >> > >> >> >> >> >> >> Currently, many configuration items and their default > values > >> >> are > >> >> >> >> >> dispersed > >> >> >> >> >> >> in the config file like HoodieWriteConfig. It’s very > >> confused > >> >> for > >> >> >> >> >> >> developers, and it's easy for developers to use them in a > >> wrong > >> >> >> place > >> >> >> >> >> >> especially when there are more and more configuration > items. > >> >> If we > >> >> >> >> can > >> >> >> >> >> >> solve this, developers will benefit from it and the code > >> >> structure > >> >> >> >> will > >> >> >> >> >> be > >> >> >> >> >> >> more concise. > >> >> >> >> >> >> > >> >> >> >> >> >> > >> >> >> >> >> >> I had create a JIRA[1] and a under discuss RFC[2] to > explain > >> >> how > >> >> >> to > >> >> >> >> >> solve > >> >> >> >> >> >> the problem, if you are interested in this, you can visit > >> jira > >> >> and > >> >> >> >> RFC > >> >> >> >> >> for > >> >> >> >> >> >> detail. Any comments and feedback are welcome, WDYT? > >> >> >> >> >> >> > >> >> >> >> >> >> > >> >> >> >> >> >> Best, > >> >> >> >> >> >> lamber-ken > >> >> >> >> >> >> > >> >> >> >> >> >> > >> >> >> >> >> >> [1] > >> >> https://issues.apache.org/jira/projects/HUDI/issues/HUDI-375 > >> >> >> >> >> >> [2] > >> >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> > >> >> >> > >> >> > >> > https://cwiki.apache.org/confluence/display/HUDI/RFC-11+%3A+Refactor+of+the+configuration+framework+of+hudi+project > >> >> >> >> >> > >> >> >> >> > >> >> >> > >> >> > > >> >> > > >> >> >-- > >> >> >Regards, > >> >> >-Sivabalan > >> >> > >> >
