I think it would be very valuable to split out publish_host and bind_host, but I think it would make sense to do that work as part of a separate JIRA.
I would like to see us in a position to ship 0.4.0 as quickly as possible. Is there a quick solution that reasonable balances security and working out-of-the-box here, Matt? On Wed, May 3, 2017 at 8:30 AM, Nick Allen <n...@nickallen.org> wrote: > It only worked "good enough" on Ansible because it was mainly used for > deploying to a controlled environment where we know the interface names; > aka Vagrant/Single Node. > > It did not work well at all on environments other than Vagrant/Single > Node. The work that was done with Elasticsearch and Ambari gives us > significantly more functionality. > > The issue now is in getting this to work safely, out-of-the-box on a much > wider range of platforms; especially ones which will have different network > setups. > > And for the record, in Ansible it simply defaulted to eth0 > > - elasticsearch_network_interface: eth0 > > <https://github.com/apache/incubator-metron/blob/Metron_0.3.1/metron-deployment/roles/elasticsearch/defaults/main.yml#L19> > - 'network.host: ["_{{ elasticsearch_network_interface > }}:ipv4_","_local:ipv4_"] > > <https://github.com/apache/incubator-metron/blob/Metron_0.3.1/metron-deployment/roles/elasticsearch/tasks/elasticsearch.yml#L69> > > > > > On Wed, May 3, 2017 at 7:56 AM, Otto Fowler <ottobackwa...@gmail.com> > wrote: > >> How is the ambari service install configuration different from prior >> configuration through ansible? >> This used to work better right? >> >> >> On May 3, 2017 at 07:06:52, zeo...@gmail.com (zeo...@gmail.com) wrote: >> >> Thanks for the good write up Matt. Here are my thoughts: >> >> D1: I don't see a way to have a default that works in every scenario. >> Documenting this and setting a sane default that works most of the time is >> probably the best path forward. >> >> D2: If we use _local_ and _site_, shouldn't it prioritize site for >> publishing, like we want? I guess if you have multiple interfaces that >> fit >> in site it is not super obvious to an end user which will be specified, >> although it is programmatic like you mentioned above. Are we specifically >> trying to bind to a global IP? >> >> To reinforce my prior comment, as a system owner who has publicly >> addressable IPs on systems, I do NOT want _global_ included by default, >> and >> thus would strongly deter from using 0.0.0.0 as well. This is asking for >> trouble. >> >> D3: To avoid confusion, I think ES should be configured like ES, and vice >> versa. Think of people who have well tuned ES systems and want to port >> their configs into Metron. >> >> Another thought - is this handled better if we upgrade ES? Afaik we don't >> really depend on ES for much, and an upgrade has other benefits, among >> those being able to natively support periods in field names[1]. I am >> doubtful this will resolve any of our concerns but figured I'd mention it >> anyway. >> >> In a separate ES related JIRA I'm working on, I will either need to de_dot >> bro fields in the parser, force the transformation in the Kafka plugin >> (not >> preferred), provide an example of how to do this in bro configs (not very >> obvious to those new to bro/es), give an example of transforming in >> stellar, or upgrade ES. I'm leaning towards upgrading ES to 2.4 at least, >> if not 5.x. >> >> 1:. >> https://www.elastic.co/guide/en/elasticsearch/reference/2.4/ >> dots-in-names.html >> >> Jon >> >> On Wed, May 3, 2017, 1:50 AM Matt Foley <ma...@apache.org> wrote: >> >> > Okay, several items that merit discussion: >> > >> > Fact A. Experiment shows that the contents of the <value> fields in >> > elastic-site.xml, and hence the values in Ambari GUI config fields, are >> > just used as big unquoted Unicode character sequences, including any >> quote >> > marks, square brackets or other punctuation, until they are written into >> > the yaml.j2 template by the {{ }} operator. Thus, the value: >> > ["_eth0_","_lo_"] >> > is a 16-character Unicode string. Yaml, of course, actually parses the >> > result. >> > This is actually nice, it makes it easy to understand and manipulate the >> > textual content of the field. >> > >> > Fact B. In the Hadoop world, config parameters that are lists, are >> usually >> > single strings containing a sequence of unquoted comma-delimited >> substrings >> > with no blank spaces. The substring elements of the list are forbidden >> to >> > have commas or anything else that would disrupt fairly obvious parsing. >> > Parsing is done by apache commons code or plain old Java. Users are >> USED >> > to working with these kinds of config params in Ambari. >> > >> > But in Elasticsearch, and some other Metron components, the parsing is >> > done by Yaml. This means: >> > - To be a list, square brackets must be provided – either in the >> value, >> > the python processing, or the template. If only one value is provided >> it >> > does not have to be in a list. >> > - List elements want to be delimited by comma-space, not just comma >> > (although it’s not clear whether this actually causes errors with >> > non-numeric list elements) >> > - Quote marks around string list elements are optional except when >> > necessary. This greatly increases the opportunity for confusion and >> error. >> > - Colon is a special character (related to dictionary parsing), so if >> > you need a colon in a string, the string needs quote marks. “_local_” >> > doesn’t need quote marks; “_local:ipv4_” does require quote marks. >> > Character sequences that would mis-parse as poorly formed numbers also >> need >> > quote marks: “0.0.0.0”. >> > >> > Fact C. The “network.host” Elasticsearch parameter is a cheat, both way >> > more powerful and way more limited than one might expect. >> > It is a cheat because it masks two underlying parameters: >> > network.bind_host and network.publish_host. This is all documented at >> > https://www.elastic.co/guide/en/elasticsearch/reference/2.3/ >> modules-network.html >> > and implemented in >> > https://github.com/elastic/elasticsearch/blob/2.3/core/src/ >> main/java/org/elasticsearch/common/network/NetworkService.java >> > (methods resolveBindHostAddresses() and resolvePublishHostAddresses()). >> > - network.bind_host is the set of addresses Elasticsearch “bind to” >> > (listens on). Supposedly it will actually bind to multiple network >> > addresses if available and specified. Whatever set of specifiers you >> gave >> > network.host get expanded into a list of actual bind addresses. If you >> > give it the wildcard value (“0.0.0.0” for ipv4), it will bind to all >> > available addresses. >> > - network.publish_host is the address Elasticsearch “publishes” for >> > clients and other servers to connect to. It will publish only one >> address. >> > If you give it a set of addresses, it picks the most “desirable” of the >> set >> > – it assures it actually is accessible, and it prefers ipv4 (or 6, >> > depending on another config), then global, then site-local, then >> > link-local, then loopback. Within each category it orders by numeric >> > magnitude of the IP address, which is hardly meaningful. This means the >> > published address can be wrong on a multi-homed server or VM, if you >> don’t >> > appropriately constrain it. >> > - The parameter values can be network addresses, network interface >> > names, host names (to be dereferenced via DNS), “special” names denoting >> > predefined sets of addresses, and combinations of the above. >> > - Wildcard and loopback addresses are allowed. >> > - If the wildcard is provided it must be the ONLY value provided >> (list >> > of length == 1), or ES will throw an error. >> > >> > Discussion item 1: If you use network.host, the same list of addresses >> > get sent to both network.bind_host and network.publish_host. The >> algorithm >> > for picking the single publish_host address is not good enough, at >> least in >> > ES 2.3, to give certainty that the right address will be published, on >> > multi-homed servers or VMs (although on non-multi-homed, it should >> > generally work fine). >> > >> > It seems to me that specifying exactly one of _local_, _site_, or >> _global_ >> > will usually give the right result, but that too can fail if the server >> has >> > multiple addresses within the same category. >> > >> > I think network.bind_host and network.publish_host should be separately >> > configured, as they are with Hadoop. >> > There’s an article here: >> > https://community.hortonworks.com/content/kbentry/24277/para >> meters-for-multi-homing.html >> > that discusses these issues at some length, and clarifies why they must >> be >> > separately configured. >> > >> > What do you-all think? >> > >> > Discussion item 2: While it’s fine to use 0.0.0.0 for the bind address, >> > it gives no guidance at all to the needed publish_host value. Using >> _local_ >> > for QuickDev and single-node deployments, and _site_ for FullDev >> > deployments and all cluster deployments, is probably a reasonable choice >> > for publish_host. >> > >> > What do you-all think? >> > >> > Discussion item 3: Should we attempt to further the “hadoop style” of >> > config parameter, and silently add the square brackets and perhaps >> > substring quotes in python processing? Or should we say users need to >> > understand ES configuration, and tell them to put the list in square >> > brackets themselves, if they need a list entry in this parameter, per >> > https://www.elastic.co/guide/en/elasticsearch/reference/2.3/ >> modules-network.html >> > ? >> > >> > Please share your thoughts, >> > Thanks, >> > --Matt >> > >> > >> > On 5/2/17, 9:57 PM, "Matt Foley" <mfo...@hortonworks.com> wrote: >> > >> > Hi Otto, >> > This event derives from this line of code: >> > https://github.com/elastic/elasticsearch/blob/2.3/core/src/ >> main/java/org/elasticsearch/action/support/master/Transpor >> tMasterNodeAction.java#L148 >> > which suggests that a cluster action has been requested on a local >> > (loopback) address. This is not >> > surprising given what I’ve learned about the semantics of >> network.host >> > with wildcard address. >> > See next message, item C. Basically, while the wildcard causes ES >> to >> > “listen” on all IP addresses, it >> > only *publishes* one, and on a multi-homed server it can be the >> wrong >> > one. I can’t be certain >> > this causes what you’re seeing, but it seems feasible. >> > >> > From: Otto Fowler <ottobackwa...@gmail.com> >> > Date: Tuesday, May 2, 2017 at 8:30 PM >> > To: "d...@metron.incubator.apache.org" < >> d...@metron.incubator.apache.org>, >> > Matt Foley <mfo...@hortonworks.com>, "dev@metron.apache.org" < >> > dev@metron.apache.org>, "zeo...@gmail.com" <zeo...@gmail.com> >> > Subject: Re: Request double-check on Ambari config logic (ES >> > network_host) >> > >> > OK. >> > I tried it using this method, and master ( adding [] ). In both >> > cases, I can hit 9200 from other machines, but in both cases I’m >> getting ES >> > master errors: >> > >> > ClusterBlockException[blocked by: [SERVICE_UNAVAILABLE/1/state not >> > recovered / initialized];] >> > at >> > org.elasticsearch.cluster.block.ClusterBlocks.indexBlockedEx >> ception(ClusterBlocks.java:174) >> > at >> > org.elasticsearch.action.admin.indices.create.TransportCreat >> eIndexAction.checkBlock(TransportCreateIndexAction.java:66) >> > at >> > org.elasticsearch.action.admin.indices.create.TransportCreat >> eIndexAction.checkBlock(TransportCreateIndexAction.java:41) >> > at >> > org.elasticsearch.action.support.master.TransportMasterNodeA >> ction$AsyncSingleAction.doStart(TransportMasterNodeAction.java:148) >> > at >> > org.elasticsearch.action.support.master.TransportMasterNodeA >> ction$AsyncSingleAction.start(TransportMasterNodeAction.java:140) >> > at >> > org.elasticsearch.action.support.master.TransportMasterNodeA >> ction.doExecute(TransportMasterNodeAction.java:107) >> > at >> > org.elasticsearch.action.support.master.TransportMasterNodeA >> ction.doExecute(TransportMasterNodeAction.java:51) >> > at >> > org.elasticsearch.action.support.TransportAction.execute( >> TransportAction.java:137) >> > at >> > org.elasticsearch.action.index.TransportIndexAction.doExecut >> e(TransportIndexAction.java:98) >> > at >> > org.elasticsearch.action.index.TransportIndexAction.doExecut >> e(TransportIndexAction.java:66) >> > at >> > org.elasticsearch.action.support.TransportAction.execute( >> TransportAction.java:137) >> > at >> > org.elasticsearch.action.support.TransportAction.execute( >> TransportAction.java:85) >> > at >> > org.elasticsearch.client.node.NodeClient.doExecute(NodeClient.java:58) >> > at >> > org.elasticsearch.client.support.AbstractClient.execute( >> AbstractClient.java:359) >> > at >> > org.elasticsearch.client.FilterClient.doExecute(FilterClient.java:52) >> > at >> > org.elasticsearch.rest.BaseRestHandler$HeadersAndContextCopy >> Client.doExecute(BaseRestHandler.java:83) >> > at >> > org.elasticsearch.client.support.AbstractClient.execute( >> AbstractClient.java:359) >> > at >> > org.elasticsearch.client.support.AbstractClient.index(Abstra >> ctClient.java:371) >> > at >> > org.elasticsearch.rest.action.index.RestIndexAction.handleRe >> quest(RestIndexAction.java:102) >> > at >> > org.elasticsearch.rest.BaseRestHandler.handleRequest(BaseRes >> tHandler.java:54) >> > at >> > org.elasticsearch.rest.RestController.executeHandler(RestCon >> troller.java:205) >> > at >> > org.elasticsearch.rest.RestController.dispatchRequest(RestCo >> ntroller.java:166) >> > at >> > org.elasticsearch.http.HttpServer.internalDispatchRequest(Ht >> tpServer.java:128) >> > at >> > org.elasticsearch.http.HttpServer$Dispatcher.dispatchRequest >> (HttpServer.java:86) >> > at >> > org.elasticsearch.http.netty.NettyHttpServerTransport.dispat >> chRequest(NettyHttpServ >> > >> > and kibana is not good. >> > >> > not sure what that error means. >> > I have 5 nodes, and put es master on #5, with #3,4 as datanodes. >> > >> > Sorry, but I don’t think my setup is going to be much help at this >> > point. >> > >> > >> > >> > >> > On May 2, 2017 at 17:19:43, Matt Foley (mfo...@hortonworks.com >> <mailto: >> > mfo...@hortonworks.com>) wrote: >> > The default will now be “0.0.0.0”, and not eth0. And this will work >> if >> > suggestions from various community members and a suggestion in the old >> 1.x >> > documentation for ES are correct. The 2.x documentation (we specify ES >> 2.3) >> > doesn’t mention “0.0.0.0”, but I think it’s likely to still work, but it >> > needs testing. >> > >> > Thanks, >> > --Matt >> > >> > From: Otto Fowler <ottobackwa...@gmail.com<mailto: >> > ottobackwa...@gmail.com>> >> > Date: Tuesday, May 2, 2017 at 11:27 AM >> > To: "d...@metron.incubator.apache.org<mailto: >> > d...@metron.incubator.apache.org>" <d...@metron.incubator.apache.org >> <mailto: >> > d...@metron.incubator.apache.org>>, Matt Foley <mfo...@hortonworks.com >> > <mailto:mfo...@hortonworks.com>>, "dev@metron.apache.org<mailto: >> > dev@metron.apache.org>" <dev@metron.apache.org<mailto: >> > dev@metron.apache.org>>, "zeo...@gmail.com" <zeo...@gmail.com<mailto: >> > zeo...@gmail.com>> >> > Subject: Re: Request double-check on Ambari config logic (ES >> > network_host) >> > >> > Are you saying that the defaults should work now? >> > Or they should work, but I still need to change the interface from >> > eth0? >> > >> > >> > >> > >> > On May 2, 2017 at 13:36:11, Matt Foley (mfo...@hortonworks.com >> <mailto: >> > mfo...@hortonworks.com><mailto:mfo...@hortonworks.com<mailto: >> > mfo...@hortonworks.com>>) wrote: >> > Hi Otto, >> > The basic change to use “0.0.0.0” as the default binding, and put >> the >> > square brackets in the template text instead of the parameter value, is >> now >> > available in >> > https://github.com/mattf-horton/incubator-metron branch METRON-905 >> > commit e879719a0c3fb >> > >> > I’m having some trouble with my test env, so if you wanted to give >> it >> > a try, that would be great. >> > If the “0.0.0.0” doesn’t work, then we should use >> > "_local_", "_site_" >> > that being the ES special values that mean aprx the same. >> > >> > I’m going to have to do trial-and-error to determine the exact >> > behavior of multi-item lists, and then write the python code to strip >> > redundant square brackets if included in the parameter value. >> > Thanks, >> > --Matt >> > >> > >> > On 5/2/17, 6:44 AM, "Otto Fowler" <ottobackwa...@gmail.com<mailto: >> > ottobackwa...@gmail.com><mailto:ottobackwa...@gmail.com<mailto: >> > ottobackwa...@gmail.com>>> wrote: >> > >> > I am working on a centos 7 cluster deploy for testing the steps. >> > I have this issue ( along with the wrong interface name ) and can >> test >> > when >> > you have it. >> > >> > An eta would help? >> > >> > >> > On May 2, 2017 at 09:14:10, zeo...@gmail.com (zeo...@gmail.com >> <mailto: >> > zeo...@gmail.com><mailto:zeo...@gmail.com<mailto:zeo...@gmail.com>>) >> > wrote: >> > >> > Are you working on this one? The JIRA doesn't look like it's >> currently >> > assigned. Thanks, >> > >> > Jon >> > >> > On Mon, May 1, 2017 at 6:40 PM Matt Foley <mfo...@hortonworks.com >> > <mailto:mfo...@hortonworks.com><mailto:mfo...@hortonworks.com<mailto: >> > mfo...@hortonworks.com>>> wrote: >> > >> > > Ah, I see I mis-read METRON-897, and Nick specifically says >> > > "lo:ipv4","eth0:ipv4" did not work for him, but >> > ["_lo:ipv4_","_eth0:ipv4_"] >> > > did work. >> > > >> > > So I went back and dug a little deeper, and realized that in the >> > > environment where "lo:ipv4","eth0:ipv4" worked for me, I had >> > modified the >> > > yaml.j2 template to include the square brackets. >> > > >> > > So the below theory is wrong. Back to the drawing board. >> > > Thanks, >> > > --Matt >> > > >> > > On 5/1/17, 3:08 PM, "Matt Foley" <ma...@apache.org<mailto: >> > ma...@apache.org><mailto:ma...@apache.org<mailto:ma...@apache.org>>> >> > wrote: >> > > >> > > Hi, there have been widely varying statements about what needs to >> be >> > > in the Elasticsearch config parameter “network_host”. I think I >> may >> > have >> > a >> > > rationale for what works and what doesn’t, but I’d like your >> input or >> > > correction. >> > > >> > > I am focusing on what worked in terms of punctuation (quotes and >> > > square brackets) with the old _lo:ip4_,_eth0:ip4_. I would like to >> > ignore >> > > for the moment, please, whether eth0 was the correct name for a >> given >> > env, >> > > and whether we can use 0.0.0.0. Instead, for systems where eth0 >> WAS >> > the >> > > correct name, I’d like to understand what worked and why. >> > > >> > > It’s complicated because the value starts out in xml, is read into >> > > python, printed by jinja, then consumed by yaml. >> > > >> > > I think there were two constructs that actually worked for this >> > > param. Please say whether this is consistent or inconsistent with >> > your >> > > experience: >> > > >> > > "_lo:ip4_","_eth0:ip4_" >> > > This worked for me. I think this was read from XML into python as >> a >> > > list of strings, then output in jinja ‘print statement‘ >> > > {{ network_host }} as a python literal list with form: >> > > [ "_lo:ip4_", "_eth0:ip4_" ] >> > > In other words, the print statement for a python list object >> injected >> > > the needed square brackets. >> > > >> > > and >> > > "[ _lo:ip4_, _eth0:ip4_ ]" >> > > Nick and Anand, please confirm if this is the form that worked for >> > > you. I think this was read from XML into python as a single >> string, >> > and >> > > output in the same jinja print statement as: >> > > [ _lo:ip4_, _eth0:ip4_ ] >> > > because the print statement for a python string object does not >> > > produce quote marks. >> > > >> > > In either case, yaml (the consumer of the jinja output) saw what >> it >> > > interprets as a list of strings (since quotes are optional for >> yaml >> > > strings). >> > > >> > > What didn’t work was: >> > > >> > > * "_lo:ip4_, _eth0:ip4_" >> > > This would be read in and output as a single string, and no square >> > > brackets would ever be introduced. >> > > >> > > * _lo:ip4_, _eth0:ip4_ or [ _lo:ip4_, _eth0:ip4_ ] >> > > (without quotes) I think the unquoted colons messed up the python >> > > parsing >> > > >> > > Finally, I don’t know whether >> > > * [ "_lo:ip4_", "_eth0:ip4_" ] >> > > worked or not, I’m not sure anyone ever tried it. By the above >> logic >> > > it probably should work. >> > > >> > > Please give me your input if you have touched on these issues. >> > > Thanks, >> > > --Matt >> > > >> > > >> > > >> > > >> > > >> > > >> > > -- >> > >> > Jon >> > >> > >> > >> > -- >> >> Jon >> > >