I realize it would useful to pick up configuration details from all ancestor classes as well, so I added that as well.
So the set of sources from which a configuration value can be sourced are (in preference order) 1) full canonical name of class i.e. org.apache.streams.twitter.config.TwitterEngagersProviderConfiguration 2) simple name of class i.e. just TwitterEngagersProviderConfiguration 3) any ancestor Class, closest ancestors preferred, i.e. org.apache.streams.twitter.config.TwitterTimelineProviderConfiguration, then org.apache.streams.twitter.config.TwitterUserInformationConfiguration, then org.apache.streams.twitter.config.TwitterConfiguration (java.lang.Object is immune from this treatment) 4) the package of the Class, i.e. org.apache.streams.twitter.config 5) all parent packages of the Class, closest packages preferred, i.e. org.apache.streams.twitter, then org.apache.streams, then org.apache, then org I look forward to getting some eyes, comments, and +1’s on this work, as it will unlock some in-progress features and make configuration of code build with streams far more flexible going forward! Thanks, sblack...@apache.org On Feb 1, 2018 at 12:06 PM, Steve Blackmon <sblack...@apache.org> wrote: I think I’ve come up with a very nice solution to this problem. - Add a new method to ComponentConfigurator - detectConfiguration() - When the caller does not provide a Config or a path to detectConfiguration, get fancy: - Search for each of the fields declared by the component POJO class on each of the following: - the SimpleClassName - the CanonicalClassName - each ancestor package of the CanonicalClassName, longest to shortest - if a field is specified at more than one package/class level, the class or longest package ancestor takes precedence. I created STREAMS-580 and submitted a PR: https://github.com/apache/streams/pull/421 The unit tests appear to be working as described above - please check it out and let me know if anything looks off. P.S. Note that after merging this capability, we’d still need to migrate each component that currently hard-codes its configuration path to adopt this method instead. On Jan 31, 2018 at 7:47 PM, Steve Blackmon <sblack...@apache.org> wrote: TL;DR I think we should start aligning the default JVM config path with the package/class of the code being configured Hello All, I’ve been working on two providers (TwitterEngagersProvider and InstagramEngagersProvider) which share significant code with other providers. This is sort of complex but here’s the nature of the implementation: a) Instantiate and run provider A, which generates user objects or user ids b) Instantiate and run provider B, using the output of provider A as the input c) All of this happens in provider C, which then transforms the output of provider B, exploding each item into a list as it’s own output. Due mostly to conventions that have been in the project for a while, providers A, B, and C all expect to find their configuration at the same place in the JVM properties - either ‘twitter’ or ‘instagram’. Additionally, providers A, B, and C all expect to be configured with a max_items parameter. So you can probably see the challenge - configuring provider C with max_items = 1000 winds up giving unhelpful direction to providers A and B about how much work they should do. We don’t need or want 1000 user ids, we really want 1000 engager ids which can typically be derived from ~50 user ids. While on the surface its nice to be able to reuse a single conf file to run a variety of providers, I see now that creates problems when running code that integrates a variety of components that expect to be configured from overlapping paths. It’s possible to get around these problems by writing glue code, but it seems like it would be smarter to have all classes by default source their configuration using a simple and obvious namespace strategy based on package/class. So instead of every twitter provider picking up: twitter.max_items = 1000 They would all be configured separately like: org.apache.streams.twitter.providers.ProviderA.max_items = 10 org.apache.streams.twitter.providers.ProviderB.max_items = 100 org.apache.streams.twitter.providers.ProviderC.max_items = 1000 Since there are other configuration properties that are required but do not differ between the three providers, a more typical configuration file would really look like: org.apache.streams.twitter.Twitter = { oauth = { … } } org.apache.streams.twitter.providers.ProviderA = ${org.apache.streams.twitter.Twitter} org.apache.streams.twitter.providers.ProviderA.max_items = 10 org.apache.streams.twitter.providers.ProviderB = ${org.apache.streams.twitter.Twitter} org.apache.streams.twitter.providers.ProviderB.max_items = 100 org.apache.streams.twitter.providers.ProviderC = ${org.apache.streams.twitter.Twitter} org.apache.streams.twitter.providers.ProviderC.max_items = 1000 This would be a breaking refactoring, targeted for the 0.6.0 release. I’m interested in others feedback on the idea in general, and specifically. For example whether fully qualified class name or just simple name is preferable as a prefix. And whether the class name used should be that of the configuration bean i.e. TwitterConfiguration or of the class that operates on it i.e. Twitter Thanks for reading, Steve