I like your approach, Matt. +1 from me. On Wed, May 3, 2017 at 1:32 PM, Matt Foley <ma...@apache.org> wrote:
> Thanks everyone for your input. After sleeping on it, and reviewing Jon’s > and Nick’s input, here is my proposal: > > 1. Keep the parameter network.host as a synonym of network.bind_host. > This is backward-compatible with our past usage, and completely predictable > in terms of results. > > 2. Add the new parameter network.publish_host. Leave it empty/undefined > (which will cause it to have the current behavior of picking one of the > network.host list elements), but document LOUDLY that the admin must > explicitly set it for multi-homed systems, and any other situations we come > to understand don’t work well with the defaults. > > 3. For single-node and QuickDev deployments, set the default value of > network.host to _local_ . > For multi-node and FullDev, set the default value to [ _local_, _site_ ] . > For the generic Mpack default, use _local_ but document that for cluster > installs it must be changed to add _site_ . > We don’t need to worry about the “:ipv4” annotation because ES by default > prefers IPv4; so we also don’t need to worry about quote marks. > > 4. Require that the parameter values be set precisely as Elasticsearch > requires, without opaque modifications of any sort. > Document the reference to https://www.elastic.co/guide/ > en/elasticsearch/reference/2.3/modules-network.html for understanding; it > is really a short, clear bit of docs. > > I’m going to start implementing this, and should have it ready to test in > a couple hours, unless anyone objects or offers an improvement. > Thanks, > --Matt > > > On 5/3/17, 6:05 AM, "David Lyle" <dlyle65...@gmail.com> wrote: > > Hi Otto, > > The Ansible settings were preserved by the mpack when deploying with > Ansible. Ansible overrides the defaults. > > -D... > > > On Wed, May 3, 2017 at 8:53 AM, Otto Fowler <ottobackwa...@gmail.com> > wrote: > > > My experience deploying with small_cluster / ansible was that it just > > worked at the time to > > my centos 6.9 esxi cluster. > > > > > > On May 3, 2017 at 08:30:59, Nick Allen (n...@nickallen.org) wrote: > > > > It only worked "good enough" on Ansible because it was mainly used > for > > deploying to a controlled environment where we know the interface > names; > > aka Vagrant/Single Node. > > > > It did not work well at all on environments other than Vagrant/Single > > Node. The work that was done with Elasticsearch and Ambari gives us > > significantly more functionality. > > > > The issue now is in getting this to work safely, out-of-the-box on a > much > > wider range of platforms; especially ones which will have different > network > > setups. > > > > And for the record, in Ansible it simply defaulted to eth0 > > > > - elasticsearch_network_interface: eth0 > > < > > https://github.com/apache/incubator-metron/blob/Metron_ > > 0.3.1/metron-deployment/roles/elasticsearch/defaults/main.yml#L19> > > > > - 'network.host: ["_{{ elasticsearch_network_interface > > }}:ipv4_","_local:ipv4_"] > > < > > https://github.com/apache/incubator-metron/blob/Metron_ > > 0.3.1/metron-deployment/roles/elasticsearch/tasks/ > elasticsearch.yml#L69> > > > > > > > > > > > > On Wed, May 3, 2017 at 7:56 AM, Otto Fowler <ottobackwa...@gmail.com > > > > wrote: > > > > > How is the ambari service install configuration different from > prior > > > configuration through ansible? > > > This used to work better right? > > > > > > > > > On May 3, 2017 at 07:06:52, zeo...@gmail.com (zeo...@gmail.com) > wrote: > > > > > > Thanks for the good write up Matt. Here are my thoughts: > > > > > > D1: I don't see a way to have a default that works in every > scenario. > > > Documenting this and setting a sane default that works most of the > time > > is > > > probably the best path forward. > > > > > > D2: If we use _local_ and _site_, shouldn't it prioritize site for > > > publishing, like we want? I guess if you have multiple interfaces > that > > fit > > > in site it is not super obvious to an end user which will be > specified, > > > although it is programmatic like you mentioned above. Are we > specifically > > > trying to bind to a global IP? > > > > > > To reinforce my prior comment, as a system owner who has publicly > > > addressable IPs on systems, I do NOT want _global_ included by > default, > > and > > > thus would strongly deter from using 0.0.0.0 as well. This is > asking for > > > trouble. > > > > > > D3: To avoid confusion, I think ES should be configured like ES, > and vice > > > versa. Think of people who have well tuned ES systems and want to > port > > > their configs into Metron. > > > > > > Another thought - is this handled better if we upgrade ES? Afaik > we don't > > > really depend on ES for much, and an upgrade has other benefits, > among > > > those being able to natively support periods in field names[1]. I > am > > > doubtful this will resolve any of our concerns but figured I'd > mention it > > > anyway. > > > > > > In a separate ES related JIRA I'm working on, I will either need to > > de_dot > > > bro fields in the parser, force the transformation in the Kafka > plugin > > (not > > > preferred), provide an example of how to do this in bro configs > (not very > > > obvious to those new to bro/es), give an example of transforming in > > > stellar, or upgrade ES. I'm leaning towards upgrading ES to 2.4 at > least, > > > if not 5.x. > > > > > > 1:. > > > https://www.elastic.co/guide/en/elasticsearch/reference/2. > > > 4/dots-in-names.html > > > > > > Jon > > > > > > On Wed, May 3, 2017, 1:50 AM Matt Foley <ma...@apache.org> wrote: > > > > > > > Okay, several items that merit discussion: > > > > > > > > Fact A. Experiment shows that the contents of the <value> fields > in > > > > elastic-site.xml, and hence the values in Ambari GUI config > fields, are > > > > just used as big unquoted Unicode character sequences, including > any > > > quote > > > > marks, square brackets or other punctuation, until they are > written > > into > > > > the yaml.j2 template by the {{ }} operator. Thus, the value: > > > > ["_eth0_","_lo_"] > > > > is a 16-character Unicode string. Yaml, of course, actually > parses the > > > > result. > > > > This is actually nice, it makes it easy to understand and > manipulate > > the > > > > textual content of the field. > > > > > > > > Fact B. In the Hadoop world, config parameters that are lists, > are > > > usually > > > > single strings containing a sequence of unquoted comma-delimited > > > substrings > > > > with no blank spaces. The substring elements of the list are > forbidden > > > to > > > > have commas or anything else that would disrupt fairly obvious > parsing. > > > > Parsing is done by apache commons code or plain old Java. Users > are > > USED > > > > to working with these kinds of config params in Ambari. > > > > > > > > But in Elasticsearch, and some other Metron components, the > parsing is > > > > done by Yaml. This means: > > > > - To be a list, square brackets must be provided – either in the > > > value, > > > > the python processing, or the template. If only one value is > provided > > it > > > > does not have to be in a list. > > > > - List elements want to be delimited by comma-space, not just > comma > > > > (although it’s not clear whether this actually causes errors with > > > > non-numeric list elements) > > > > - Quote marks around string list elements are optional except > when > > > > necessary. This greatly increases the opportunity for confusion > and > > > error. > > > > - Colon is a special character (related to dictionary parsing), > so if > > > > you need a colon in a string, the string needs quote marks. > “_local_” > > > > doesn’t need quote marks; “_local:ipv4_” does require quote > marks. > > > > Character sequences that would mis-parse as poorly formed > numbers also > > > need > > > > quote marks: “0.0.0.0”. > > > > > > > > Fact C. The “network.host” Elasticsearch parameter is a cheat, > both way > > > > more powerful and way more limited than one might expect. > > > > It is a cheat because it masks two underlying parameters: > > > > network.bind_host and network.publish_host. This is all > documented at > > > > https://www.elastic.co/guide/en/elasticsearch/reference/2. > > > 3/modules-network.html > > > > and implemented in > > > > https://github.com/elastic/elasticsearch/blob/2.3/core/ > > > src/main/java/org/elasticsearch/common/network/NetworkService.java > > > > (methods resolveBindHostAddresses() and > resolvePublishHostAddresses()) > > . > > > > - network.bind_host is the set of addresses Elasticsearch “bind > to” > > > > (listens on). Supposedly it will actually bind to multiple > network > > > > addresses if available and specified. Whatever set of specifiers > you > > > gave > > > > network.host get expanded into a list of actual bind addresses. > If you > > > > give it the wildcard value (“0.0.0.0” for ipv4), it will bind to > all > > > > available addresses. > > > > - network.publish_host is the address Elasticsearch “publishes” > for > > > > clients and other servers to connect to. It will publish only one > > > address. > > > > If you give it a set of addresses, it picks the most “desirable” > of the > > > set > > > > – it assures it actually is accessible, and it prefers ipv4 (or > 6, > > > > depending on another config), then global, then site-local, then > > > > link-local, then loopback. Within each category it orders by > numeric > > > > magnitude of the IP address, which is hardly meaningful. This > means the > > > > published address can be wrong on a multi-homed server or VM, if > you > > > don’t > > > > appropriately constrain it. > > > > - The parameter values can be network addresses, network > interface > > > > names, host names (to be dereferenced via DNS), “special” names > > denoting > > > > predefined sets of addresses, and combinations of the above. > > > > - Wildcard and loopback addresses are allowed. > > > > - If the wildcard is provided it must be the ONLY value provided > (list > > > > of length == 1), or ES will throw an error. > > > > > > > > Discussion item 1: If you use network.host, the same list of > addresses > > > > get sent to both network.bind_host and network.publish_host. The > > > algorithm > > > > for picking the single publish_host address is not good enough, > at > > least > > > in > > > > ES 2.3, to give certainty that the right address will be > published, on > > > > multi-homed servers or VMs (although on non-multi-homed, it > should > > > > generally work fine). > > > > > > > > It seems to me that specifying exactly one of _local_, _site_, or > > > _global_ > > > > will usually give the right result, but that too can fail if the > server > > > has > > > > multiple addresses within the same category. > > > > > > > > I think network.bind_host and network.publish_host should be > separately > > > > configured, as they are with Hadoop. > > > > There’s an article here: > > > > https://community.hortonworks.com/content/kbentry/24277/ > > > parameters-for-multi-homing.html > > > > that discusses these issues at some length, and clarifies why > they must > > > be > > > > separately configured. > > > > > > > > What do you-all think? > > > > > > > > Discussion item 2: While it’s fine to use 0.0.0.0 for the bind > address, > > > > it gives no guidance at all to the needed publish_host value. > Using > > > _local_ > > > > for QuickDev and single-node deployments, and _site_ for FullDev > > > > deployments and all cluster deployments, is probably a reasonable > > choice > > > > for publish_host. > > > > > > > > What do you-all think? > > > > > > > > Discussion item 3: Should we attempt to further the “hadoop > style” of > > > > config parameter, and silently add the square brackets and > perhaps > > > > substring quotes in python processing? Or should we say users > need to > > > > understand ES configuration, and tell them to put the list in > square > > > > brackets themselves, if they need a list entry in this > parameter, per > > > > https://www.elastic.co/guide/en/elasticsearch/reference/2. > > > 3/modules-network.html > > > > ? > > > > > > > > Please share your thoughts, > > > > Thanks, > > > > --Matt > > > > > > > > > > > > On 5/2/17, 9:57 PM, "Matt Foley" <mfo...@hortonworks.com> wrote: > > > > > > > > Hi Otto, > > > > This event derives from this line of code: > > > > https://github.com/elastic/elasticsearch/blob/2.3/core/ > > > src/main/java/org/elasticsearch/action/support/master/ > > > TransportMasterNodeAction.java#L148 > > > > which suggests that a cluster action has been requested on a > local > > > > (loopback) address. This is not > > > > surprising given what I’ve learned about the semantics of > > > network.host > > > > with wildcard address. > > > > See next message, item C. Basically, while the wildcard causes > ES to > > > > “listen” on all IP addresses, it > > > > only *publishes* one, and on a multi-homed server it can be the > wrong > > > > one. I can’t be certain > > > > this causes what you’re seeing, but it seems feasible. > > > > > > > > From: Otto Fowler <ottobackwa...@gmail.com> > > > > Date: Tuesday, May 2, 2017 at 8:30 PM > > > > To: "d...@metron.incubator.apache.org" > <dev@metron.incubator.apache. > > > org>, > > > > Matt Foley <mfo...@hortonworks.com>, "dev@metron.apache.org" < > > > > dev@metron.apache.org>, "zeo...@gmail.com" <zeo...@gmail.com> > > > > Subject: Re: Request double-check on Ambari config logic (ES > > > > network_host) > > > > > > > > OK. > > > > I tried it using this method, and master ( adding [] ). In both > > > > cases, I can hit 9200 from other machines, but in both cases I’m > > getting > > > ES > > > > master errors: > > > > > > > > ClusterBlockException[blocked by: [SERVICE_UNAVAILABLE/1/state > not > > > > recovered / initialized];] > > > > at > > > > org.elasticsearch.cluster.block.ClusterBlocks. > indexBlockedException( > > > ClusterBlocks.java:174) > > > > at > > > > org.elasticsearch.action.admin.indices.create. > > > TransportCreateIndexAction.checkBlock(TransportCreateIndexAction. > > java:66) > > > > at > > > > org.elasticsearch.action.admin.indices.create. > > > TransportCreateIndexAction.checkBlock(TransportCreateIndexAction. > > java:41) > > > > at > > > > org.elasticsearch.action.support.master. > TransportMasterNodeAction$ > > > AsyncSingleAction.doStart(TransportMasterNodeAction.java:148) > > > > at > > > > org.elasticsearch.action.support.master. > TransportMasterNodeAction$ > > > AsyncSingleAction.start(TransportMasterNodeAction.java:140) > > > > at > > > > org.elasticsearch.action.support.master. > TransportMasterNodeAction. > > > doExecute(TransportMasterNodeAction.java:107) > > > > at > > > > org.elasticsearch.action.support.master. > TransportMasterNodeAction. > > > doExecute(TransportMasterNodeAction.java:51) > > > > at > > > > org.elasticsearch.action.support.TransportAction. > > > execute(TransportAction.java:137) > > > > at > > > > org.elasticsearch.action.index.TransportIndexAction.doExecute( > > > TransportIndexAction.java:98) > > > > at > > > > org.elasticsearch.action.index.TransportIndexAction.doExecute( > > > TransportIndexAction.java:66) > > > > at > > > > org.elasticsearch.action.support.TransportAction. > > > execute(TransportAction.java:137) > > > > at > > > > org.elasticsearch.action.support.TransportAction. > > > execute(TransportAction.java:85) > > > > at > > > > org.elasticsearch.client.node.NodeClient.doExecute( > NodeClient.java:58) > > > > at > > > > org.elasticsearch.client.support.AbstractClient. > > > execute(AbstractClient.java:359) > > > > at > > > > org.elasticsearch.client.FilterClient.doExecute( > FilterClient.java:52) > > > > at > > > > org.elasticsearch.rest.BaseRestHandler$ > HeadersAndContextCopyClient. > > > doExecute(BaseRestHandler.java:83) > > > > at > > > > org.elasticsearch.client.support.AbstractClient. > > > execute(AbstractClient.java:359) > > > > at > > > > org.elasticsearch.client.support.AbstractClient.index( > > > AbstractClient.java:371) > > > > at > > > > org.elasticsearch.rest.action.index.RestIndexAction. > > > handleRequest(RestIndexAction.java:102) > > > > at > > > > org.elasticsearch.rest.BaseRestHandler.handleRequest( > > > BaseRestHandler.java:54) > > > > at > > > > org.elasticsearch.rest.RestController.executeHandler( > > > RestController.java:205) > > > > at > > > > org.elasticsearch.rest.RestController.dispatchRequest( > > > RestController.java:166) > > > > at > > > > org.elasticsearch.http.HttpServer.internalDispatchRequest( > > > HttpServer.java:128) > > > > at > > > > > > org.elasticsearch.http.HttpServer$Dispatcher. > dispatchRequest(HttpServer. > > > java:86) > > > > at > > > > org.elasticsearch.http.netty.NettyHttpServerTransport. > > > dispatchRequest(NettyHttpServ > > > > > > > > and kibana is not good. > > > > > > > > not sure what that error means. > > > > I have 5 nodes, and put es master on #5, with #3,4 as datanodes. > > > > > > > > Sorry, but I don’t think my setup is going to be much help at > this > > > > point. > > > > > > > > > > > > > > > > > > > > On May 2, 2017 at 17:19:43, Matt Foley (mfo...@hortonworks.com< > > > mailto: > > > > mfo...@hortonworks.com>) wrote: > > > > The default will now be “0.0.0.0”, and not eth0. And this will > work > > > if > > > > suggestions from various community members and a suggestion in > the old > > > 1.x > > > > documentation for ES are correct. The 2.x documentation (we > specify ES > > > 2.3) > > > > doesn’t mention “0.0.0.0”, but I think it’s likely to still > work, but > > it > > > > needs testing. > > > > > > > > Thanks, > > > > --Matt > > > > > > > > From: Otto Fowler <ottobackwa...@gmail.com<mailto: > > > > ottobackwa...@gmail.com>> > > > > Date: Tuesday, May 2, 2017 at 11:27 AM > > > > To: "d...@metron.incubator.apache.org<mailto: > > > > d...@metron.incubator.apache.org>" <dev@metron.incubator.apache. > org > > > <mailto: > > > > d...@metron.incubator.apache.org>>, Matt Foley < > mfo...@hortonworks.com > > > > <mailto:mfo...@hortonworks.com>>, "dev@metron.apache.org<mailto: > > > > dev@metron.apache.org>" <dev@metron.apache.org<mailto: > > > > dev@metron.apache.org>>, "zeo...@gmail.com" <zeo...@gmail.com > <mailto: > > > > zeo...@gmail.com>> > > > > Subject: Re: Request double-check on Ambari config logic (ES > > > > network_host) > > > > > > > > Are you saying that the defaults should work now? > > > > Or they should work, but I still need to change the interface > from > > > > eth0? > > > > > > > > > > > > > > > > > > > > On May 2, 2017 at 13:36:11, Matt Foley (mfo...@hortonworks.com< > > > mailto: > > > > mfo...@hortonworks.com><mailto:mfo...@hortonworks.com<mailto: > > > > mfo...@hortonworks.com>>) wrote: > > > > Hi Otto, > > > > The basic change to use “0.0.0.0” as the default binding, and > put the > > > > square brackets in the template text instead of the parameter > value, is > > > now > > > > available in > > > > https://github.com/mattf-horton/incubator-metron branch > METRON-905 > > > > commit e879719a0c3fb > > > > > > > > I’m having some trouble with my test env, so if you wanted to > give it > > > > a try, that would be great. > > > > If the “0.0.0.0” doesn’t work, then we should use > > > > "_local_", "_site_" > > > > that being the ES special values that mean aprx the same. > > > > > > > > I’m going to have to do trial-and-error to determine the exact > > > > behavior of multi-item lists, and then write the python code to > strip > > > > redundant square brackets if included in the parameter value. > > > > Thanks, > > > > --Matt > > > > > > > > > > > > On 5/2/17, 6:44 AM, "Otto Fowler" <ottobackwa...@gmail.com< > mailto: > > > > ottobackwa...@gmail.com><mailto:ottobackwa...@gmail.com<mailto: > > > > ottobackwa...@gmail.com>>> wrote: > > > > > > > > I am working on a centos 7 cluster deploy for testing the steps. > > > > I have this issue ( along with the wrong interface name ) and can > > > test > > > > when > > > > you have it. > > > > > > > > An eta would help? > > > > > > > > > > > > On May 2, 2017 at 09:14:10, zeo...@gmail.com (zeo...@gmail.com > > > <mailto: > > > > zeo...@gmail.com><mailto:zeo...@gmail.com<mailto:zeolla@ > gmail.com>>) > > > > wrote: > > > > > > > > Are you working on this one? The JIRA doesn't look like it's > > > currently > > > > assigned. Thanks, > > > > > > > > Jon > > > > > > > > On Mon, May 1, 2017 at 6:40 PM Matt Foley < > mfo...@hortonworks.com > > > > <mailto:mfo...@hortonworks.com><mailto:mfo...@hortonworks.com > <mailto: > > > > mfo...@hortonworks.com>>> wrote: > > > > > > > > > Ah, I see I mis-read METRON-897, and Nick specifically says > > > > > "lo:ipv4","eth0:ipv4" did not work for him, but > > > > ["_lo:ipv4_","_eth0:ipv4_"] > > > > > did work. > > > > > > > > > > So I went back and dug a little deeper, and realized that in > the > > > > > environment where "lo:ipv4","eth0:ipv4" worked for me, I had > > > > modified the > > > > > yaml.j2 template to include the square brackets. > > > > > > > > > > So the below theory is wrong. Back to the drawing board. > > > > > Thanks, > > > > > --Matt > > > > > > > > > > On 5/1/17, 3:08 PM, "Matt Foley" <ma...@apache.org<mailto: > > > > ma...@apache.org><mailto:ma...@apache.org<mailto:mattf@ > apache.org>>> > > > > wrote: > > > > > > > > > > Hi, there have been widely varying statements about what needs > to > > > be > > > > > in the Elasticsearch config parameter “network_host”. I think > I may > > > > have > > > > a > > > > > rationale for what works and what doesn’t, but I’d like your > input > > > or > > > > > correction. > > > > > > > > > > I am focusing on what worked in terms of punctuation (quotes > and > > > > > square brackets) with the old _lo:ip4_,_eth0:ip4_. I would > like to > > > > ignore > > > > > for the moment, please, whether eth0 was the correct name for a > > > given > > > > env, > > > > > and whether we can use 0.0.0.0. Instead, for systems where > eth0 WAS > > > > the > > > > > correct name, I’d like to understand what worked and why. > > > > > > > > > > It’s complicated because the value starts out in xml, is read > into > > > > > python, printed by jinja, then consumed by yaml. > > > > > > > > > > I think there were two constructs that actually worked for this > > > > > param. Please say whether this is consistent or inconsistent > with > > > > your > > > > > experience: > > > > > > > > > > "_lo:ip4_","_eth0:ip4_" > > > > > This worked for me. I think this was read from XML into python > as a > > > > > list of strings, then output in jinja ‘print statement‘ > > > > > {{ network_host }} as a python literal list with form: > > > > > [ "_lo:ip4_", "_eth0:ip4_" ] > > > > > In other words, the print statement for a python list object > > > injected > > > > > the needed square brackets. > > > > > > > > > > and > > > > > "[ _lo:ip4_, _eth0:ip4_ ]" > > > > > Nick and Anand, please confirm if this is the form that worked > for > > > > > you. I think this was read from XML into python as a single > string, > > > > and > > > > > output in the same jinja print statement as: > > > > > [ _lo:ip4_, _eth0:ip4_ ] > > > > > because the print statement for a python string object does not > > > > > produce quote marks. > > > > > > > > > > In either case, yaml (the consumer of the jinja output) saw > what it > > > > > interprets as a list of strings (since quotes are optional for > yaml > > > > > strings). > > > > > > > > > > What didn’t work was: > > > > > > > > > > * "_lo:ip4_, _eth0:ip4_" > > > > > This would be read in and output as a single string, and no > square > > > > > brackets would ever be introduced. > > > > > > > > > > * _lo:ip4_, _eth0:ip4_ or [ _lo:ip4_, _eth0:ip4_ ] > > > > > (without quotes) I think the unquoted colons messed up the > python > > > > > parsing > > > > > > > > > > Finally, I don’t know whether > > > > > * [ "_lo:ip4_", "_eth0:ip4_" ] > > > > > worked or not, I’m not sure anyone ever tried it. By the above > > > logic > > > > > it probably should work. > > > > > > > > > > Please give me your input if you have touched on these issues. > > > > > Thanks, > > > > > --Matt > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > Jon > > > > > > > > > > > > > > > > -- > > > > > > Jon > > > > > > > > >