Thanks everyone for your input.  After sleeping on it, and reviewing Jon’s and 
Nick’s input, here is my proposal:

1. Keep the parameter network.host as a synonym of network.bind_host.  This is 
backward-compatible with our past usage, and completely predictable in terms of 
results.

2. Add the new parameter network.publish_host.  Leave it empty/undefined (which 
will cause it to have the current behavior of picking one of the network.host 
list elements), but document LOUDLY that the admin must explicitly set it for 
multi-homed systems, and any other situations we come to understand don’t work 
well with the defaults.

3. For single-node and QuickDev deployments, set the default value of 
network.host to _local_ .
For multi-node and FullDev, set the default value to [ _local_, _site_ ] .
For the generic Mpack default, use _local_ but document that for cluster 
installs it must be changed to add _site_ .
We don’t need to worry about the “:ipv4” annotation because ES by default 
prefers IPv4; so we also don’t need to worry about quote marks.

4. Require that the parameter values be set precisely as Elasticsearch 
requires, without opaque modifications of any sort.
Document the reference to 
https://www.elastic.co/guide/en/elasticsearch/reference/2.3/modules-network.html
 for understanding; it is really a short, clear bit of docs.

I’m going to start implementing this, and should have it ready to test in a 
couple hours, unless anyone objects or offers an improvement.
Thanks,
--Matt


On 5/3/17, 6:05 AM, "David Lyle" <dlyle65...@gmail.com> wrote:

    Hi Otto,
    
    The Ansible settings were preserved by the mpack when deploying with
    Ansible. Ansible overrides the defaults.
    
    -D...
    
    
    On Wed, May 3, 2017 at 8:53 AM, Otto Fowler <ottobackwa...@gmail.com> wrote:
    
    > My experience deploying with small_cluster / ansible was that it just
    > worked at the time to
    > my centos 6.9 esxi cluster.
    >
    >
    > On May 3, 2017 at 08:30:59, Nick Allen (n...@nickallen.org) wrote:
    >
    > It only worked "good enough" on Ansible because it was mainly used for
    > deploying to a controlled environment where we know the interface names;
    > aka Vagrant/Single Node.
    >
    > It did not work well at all on environments other than Vagrant/Single
    > Node. The work that was done with Elasticsearch and Ambari gives us
    > significantly more functionality.
    >
    > The issue now is in getting this to work safely, out-of-the-box on a much
    > wider range of platforms; especially ones which will have different 
network
    > setups.​
    >
    > And for the record, in Ansible it simply defaulted to eth0
    >
    > - elasticsearch_network_interface: eth0
    > <
    > https://github.com/apache/incubator-metron/blob/Metron_
    > 0.3.1/metron-deployment/roles/elasticsearch/defaults/main.yml#L19>
    >
    > - 'network.host: ["_{{ elasticsearch_network_interface
    > }}:ipv4_","_local:ipv4_"]
    > <
    > https://github.com/apache/incubator-metron/blob/Metron_
    > 0.3.1/metron-deployment/roles/elasticsearch/tasks/elasticsearch.yml#L69>
    >
    >
    >
    >
    >
    > On Wed, May 3, 2017 at 7:56 AM, Otto Fowler <ottobackwa...@gmail.com>
    > wrote:
    >
    > > How is the ambari service install configuration different from prior
    > > configuration through ansible?
    > > This used to work better right?
    > >
    > >
    > > On May 3, 2017 at 07:06:52, zeo...@gmail.com (zeo...@gmail.com) wrote:
    > >
    > > Thanks for the good write up Matt. Here are my thoughts:
    > >
    > > D1: I don't see a way to have a default that works in every scenario.
    > > Documenting this and setting a sane default that works most of the time
    > is
    > > probably the best path forward.
    > >
    > > D2: If we use _local_ and _site_, shouldn't it prioritize site for
    > > publishing, like we want? I guess if you have multiple interfaces that
    > fit
    > > in site it is not super obvious to an end user which will be specified,
    > > although it is programmatic like you mentioned above. Are we 
specifically
    > > trying to bind to a global IP?
    > >
    > > To reinforce my prior comment, as a system owner who has publicly
    > > addressable IPs on systems, I do NOT want _global_ included by default,
    > and
    > > thus would strongly deter from using 0.0.0.0 as well. This is asking for
    > > trouble.
    > >
    > > D3: To avoid confusion, I think ES should be configured like ES, and 
vice
    > > versa. Think of people who have well tuned ES systems and want to port
    > > their configs into Metron.
    > >
    > > Another thought - is this handled better if we upgrade ES? Afaik we 
don't
    > > really depend on ES for much, and an upgrade has other benefits, among
    > > those being able to natively support periods in field names[1]. I am
    > > doubtful this will resolve any of our concerns but figured I'd mention 
it
    > > anyway.
    > >
    > > In a separate ES related JIRA I'm working on, I will either need to
    > de_dot
    > > bro fields in the parser, force the transformation in the Kafka plugin
    > (not
    > > preferred), provide an example of how to do this in bro configs (not 
very
    > > obvious to those new to bro/es), give an example of transforming in
    > > stellar, or upgrade ES. I'm leaning towards upgrading ES to 2.4 at 
least,
    > > if not 5.x.
    > >
    > > 1:.
    > > https://www.elastic.co/guide/en/elasticsearch/reference/2.
    > > 4/dots-in-names.html
    > >
    > > Jon
    > >
    > > On Wed, May 3, 2017, 1:50 AM Matt Foley <ma...@apache.org> wrote:
    > >
    > > > Okay, several items that merit discussion:
    > > >
    > > > Fact A. Experiment shows that the contents of the <value> fields in
    > > > elastic-site.xml, and hence the values in Ambari GUI config fields, 
are
    > > > just used as big unquoted Unicode character sequences, including any
    > > quote
    > > > marks, square brackets or other punctuation, until they are written
    > into
    > > > the yaml.j2 template by the {{ }} operator. Thus, the value:
    > > > ["_eth0_","_lo_"]
    > > > is a 16-character Unicode string. Yaml, of course, actually parses the
    > > > result.
    > > > This is actually nice, it makes it easy to understand and manipulate
    > the
    > > > textual content of the field.
    > > >
    > > > Fact B. In the Hadoop world, config parameters that are lists, are
    > > usually
    > > > single strings containing a sequence of unquoted comma-delimited
    > > substrings
    > > > with no blank spaces. The substring elements of the list are forbidden
    > > to
    > > > have commas or anything else that would disrupt fairly obvious 
parsing.
    > > > Parsing is done by apache commons code or plain old Java. Users are
    > USED
    > > > to working with these kinds of config params in Ambari.
    > > >
    > > > But in Elasticsearch, and some other Metron components, the parsing is
    > > > done by Yaml. This means:
    > > > - To be a list, square brackets must be provided – either in the
    > > value,
    > > > the python processing, or the template. If only one value is provided
    > it
    > > > does not have to be in a list.
    > > > - List elements want to be delimited by comma-space, not just comma
    > > > (although it’s not clear whether this actually causes errors with
    > > > non-numeric list elements)
    > > > - Quote marks around string list elements are optional except when
    > > > necessary. This greatly increases the opportunity for confusion and
    > > error.
    > > > - Colon is a special character (related to dictionary parsing), so if
    > > > you need a colon in a string, the string needs quote marks. “_local_”
    > > > doesn’t need quote marks; “_local:ipv4_” does require quote marks.
    > > > Character sequences that would mis-parse as poorly formed numbers also
    > > need
    > > > quote marks: “0.0.0.0”.
    > > >
    > > > Fact C. The “network.host” Elasticsearch parameter is a cheat, both 
way
    > > > more powerful and way more limited than one might expect.
    > > > It is a cheat because it masks two underlying parameters:
    > > > network.bind_host and network.publish_host. This is all documented at
    > > > https://www.elastic.co/guide/en/elasticsearch/reference/2.
    > > 3/modules-network.html
    > > > and implemented in
    > > > https://github.com/elastic/elasticsearch/blob/2.3/core/
    > > src/main/java/org/elasticsearch/common/network/NetworkService.java
    > > > (methods resolveBindHostAddresses() and resolvePublishHostAddresses())
    > .
    > > > - network.bind_host is the set of addresses Elasticsearch “bind to”
    > > > (listens on). Supposedly it will actually bind to multiple network
    > > > addresses if available and specified. Whatever set of specifiers you
    > > gave
    > > > network.host get expanded into a list of actual bind addresses. If you
    > > > give it the wildcard value (“0.0.0.0” for ipv4), it will bind to all
    > > > available addresses.
    > > > - network.publish_host is the address Elasticsearch “publishes” for
    > > > clients and other servers to connect to. It will publish only one
    > > address.
    > > > If you give it a set of addresses, it picks the most “desirable” of 
the
    > > set
    > > > – it assures it actually is accessible, and it prefers ipv4 (or 6,
    > > > depending on another config), then global, then site-local, then
    > > > link-local, then loopback. Within each category it orders by numeric
    > > > magnitude of the IP address, which is hardly meaningful. This means 
the
    > > > published address can be wrong on a multi-homed server or VM, if you
    > > don’t
    > > > appropriately constrain it.
    > > > - The parameter values can be network addresses, network interface
    > > > names, host names (to be dereferenced via DNS), “special” names
    > denoting
    > > > predefined sets of addresses, and combinations of the above.
    > > > - Wildcard and loopback addresses are allowed.
    > > > - If the wildcard is provided it must be the ONLY value provided (list
    > > > of length == 1), or ES will throw an error.
    > > >
    > > > Discussion item 1: If you use network.host, the same list of addresses
    > > > get sent to both network.bind_host and network.publish_host. The
    > > algorithm
    > > > for picking the single publish_host address is not good enough, at
    > least
    > > in
    > > > ES 2.3, to give certainty that the right address will be published, on
    > > > multi-homed servers or VMs (although on non-multi-homed, it should
    > > > generally work fine).
    > > >
    > > > It seems to me that specifying exactly one of _local_, _site_, or
    > > _global_
    > > > will usually give the right result, but that too can fail if the 
server
    > > has
    > > > multiple addresses within the same category.
    > > >
    > > > I think network.bind_host and network.publish_host should be 
separately
    > > > configured, as they are with Hadoop.
    > > > There’s an article here:
    > > > https://community.hortonworks.com/content/kbentry/24277/
    > > parameters-for-multi-homing.html
    > > > that discusses these issues at some length, and clarifies why they 
must
    > > be
    > > > separately configured.
    > > >
    > > > What do you-all think?
    > > >
    > > > Discussion item 2: While it’s fine to use 0.0.0.0 for the bind 
address,
    > > > it gives no guidance at all to the needed publish_host value. Using
    > > _local_
    > > > for QuickDev and single-node deployments, and _site_ for FullDev
    > > > deployments and all cluster deployments, is probably a reasonable
    > choice
    > > > for publish_host.
    > > >
    > > > What do you-all think?
    > > >
    > > > Discussion item 3: Should we attempt to further the “hadoop style” of
    > > > config parameter, and silently add the square brackets and perhaps
    > > > substring quotes in python processing? Or should we say users need to
    > > > understand ES configuration, and tell them to put the list in square
    > > > brackets themselves, if they need a list entry in this parameter, per
    > > > https://www.elastic.co/guide/en/elasticsearch/reference/2.
    > > 3/modules-network.html
    > > > ?
    > > >
    > > > Please share your thoughts,
    > > > Thanks,
    > > > --Matt
    > > >
    > > >
    > > > On 5/2/17, 9:57 PM, "Matt Foley" <mfo...@hortonworks.com> wrote:
    > > >
    > > > Hi Otto,
    > > > This event derives from this line of code:
    > > > https://github.com/elastic/elasticsearch/blob/2.3/core/
    > > src/main/java/org/elasticsearch/action/support/master/
    > > TransportMasterNodeAction.java#L148
    > > > which suggests that a cluster action has been requested on a local
    > > > (loopback) address. This is not
    > > > surprising given what I’ve learned about the semantics of
    > > network.host
    > > > with wildcard address.
    > > > See next message, item C. Basically, while the wildcard causes ES to
    > > > “listen” on all IP addresses, it
    > > > only *publishes* one, and on a multi-homed server it can be the wrong
    > > > one. I can’t be certain
    > > > this causes what you’re seeing, but it seems feasible.
    > > >
    > > > From: Otto Fowler <ottobackwa...@gmail.com>
    > > > Date: Tuesday, May 2, 2017 at 8:30 PM
    > > > To: "d...@metron.incubator.apache.org" <dev@metron.incubator.apache.
    > > org>,
    > > > Matt Foley <mfo...@hortonworks.com>, "dev@metron.apache.org" <
    > > > dev@metron.apache.org>, "zeo...@gmail.com" <zeo...@gmail.com>
    > > > Subject: Re: Request double-check on Ambari config logic (ES
    > > > network_host)
    > > >
    > > > OK.
    > > > I tried it using this method, and master ( adding [] ). In both
    > > > cases, I can hit 9200 from other machines, but in both cases I’m
    > getting
    > > ES
    > > > master errors:
    > > >
    > > > ClusterBlockException[blocked by: [SERVICE_UNAVAILABLE/1/state not
    > > > recovered / initialized];]
    > > > at
    > > > org.elasticsearch.cluster.block.ClusterBlocks.indexBlockedException(
    > > ClusterBlocks.java:174)
    > > > at
    > > > org.elasticsearch.action.admin.indices.create.
    > > TransportCreateIndexAction.checkBlock(TransportCreateIndexAction.
    > java:66)
    > > > at
    > > > org.elasticsearch.action.admin.indices.create.
    > > TransportCreateIndexAction.checkBlock(TransportCreateIndexAction.
    > java:41)
    > > > at
    > > > org.elasticsearch.action.support.master.TransportMasterNodeAction$
    > > AsyncSingleAction.doStart(TransportMasterNodeAction.java:148)
    > > > at
    > > > org.elasticsearch.action.support.master.TransportMasterNodeAction$
    > > AsyncSingleAction.start(TransportMasterNodeAction.java:140)
    > > > at
    > > > org.elasticsearch.action.support.master.TransportMasterNodeAction.
    > > doExecute(TransportMasterNodeAction.java:107)
    > > > at
    > > > org.elasticsearch.action.support.master.TransportMasterNodeAction.
    > > doExecute(TransportMasterNodeAction.java:51)
    > > > at
    > > > org.elasticsearch.action.support.TransportAction.
    > > execute(TransportAction.java:137)
    > > > at
    > > > org.elasticsearch.action.index.TransportIndexAction.doExecute(
    > > TransportIndexAction.java:98)
    > > > at
    > > > org.elasticsearch.action.index.TransportIndexAction.doExecute(
    > > TransportIndexAction.java:66)
    > > > at
    > > > org.elasticsearch.action.support.TransportAction.
    > > execute(TransportAction.java:137)
    > > > at
    > > > org.elasticsearch.action.support.TransportAction.
    > > execute(TransportAction.java:85)
    > > > at
    > > > org.elasticsearch.client.node.NodeClient.doExecute(NodeClient.java:58)
    > > > at
    > > > org.elasticsearch.client.support.AbstractClient.
    > > execute(AbstractClient.java:359)
    > > > at
    > > > org.elasticsearch.client.FilterClient.doExecute(FilterClient.java:52)
    > > > at
    > > > org.elasticsearch.rest.BaseRestHandler$HeadersAndContextCopyClient.
    > > doExecute(BaseRestHandler.java:83)
    > > > at
    > > > org.elasticsearch.client.support.AbstractClient.
    > > execute(AbstractClient.java:359)
    > > > at
    > > > org.elasticsearch.client.support.AbstractClient.index(
    > > AbstractClient.java:371)
    > > > at
    > > > org.elasticsearch.rest.action.index.RestIndexAction.
    > > handleRequest(RestIndexAction.java:102)
    > > > at
    > > > org.elasticsearch.rest.BaseRestHandler.handleRequest(
    > > BaseRestHandler.java:54)
    > > > at
    > > > org.elasticsearch.rest.RestController.executeHandler(
    > > RestController.java:205)
    > > > at
    > > > org.elasticsearch.rest.RestController.dispatchRequest(
    > > RestController.java:166)
    > > > at
    > > > org.elasticsearch.http.HttpServer.internalDispatchRequest(
    > > HttpServer.java:128)
    > > > at
    > > >
    > org.elasticsearch.http.HttpServer$Dispatcher.dispatchRequest(HttpServer.
    > > java:86)
    > > > at
    > > > org.elasticsearch.http.netty.NettyHttpServerTransport.
    > > dispatchRequest(NettyHttpServ
    > > >
    > > > and kibana is not good.
    > > >
    > > > not sure what that error means.
    > > > I have 5 nodes, and put es master on #5, with #3,4 as datanodes.
    > > >
    > > > Sorry, but I don’t think my setup is going to be much help at this
    > > > point.
    > > >
    > > >
    > > >
    > > >
    > > > On May 2, 2017 at 17:19:43, Matt Foley (mfo...@hortonworks.com<
    > > mailto:
    > > > mfo...@hortonworks.com>) wrote:
    > > > The default will now be “0.0.0.0”, and not eth0. And this will work
    > > if
    > > > suggestions from various community members and a suggestion in the old
    > > 1.x
    > > > documentation for ES are correct. The 2.x documentation (we specify ES
    > > 2.3)
    > > > doesn’t mention “0.0.0.0”, but I think it’s likely to still work, but
    > it
    > > > needs testing.
    > > >
    > > > Thanks,
    > > > --Matt
    > > >
    > > > From: Otto Fowler <ottobackwa...@gmail.com<mailto:
    > > > ottobackwa...@gmail.com>>
    > > > Date: Tuesday, May 2, 2017 at 11:27 AM
    > > > To: "d...@metron.incubator.apache.org<mailto:
    > > > d...@metron.incubator.apache.org>" <d...@metron.incubator.apache.org
    > > <mailto:
    > > > d...@metron.incubator.apache.org>>, Matt Foley <mfo...@hortonworks.com
    > > > <mailto:mfo...@hortonworks.com>>, "dev@metron.apache.org<mailto:
    > > > dev@metron.apache.org>" <dev@metron.apache.org<mailto:
    > > > dev@metron.apache.org>>, "zeo...@gmail.com" <zeo...@gmail.com<mailto:
    > > > zeo...@gmail.com>>
    > > > Subject: Re: Request double-check on Ambari config logic (ES
    > > > network_host)
    > > >
    > > > Are you saying that the defaults should work now?
    > > > Or they should work, but I still need to change the interface from
    > > > eth0?
    > > >
    > > >
    > > >
    > > >
    > > > On May 2, 2017 at 13:36:11, Matt Foley (mfo...@hortonworks.com<
    > > mailto:
    > > > mfo...@hortonworks.com><mailto:mfo...@hortonworks.com<mailto:
    > > > mfo...@hortonworks.com>>) wrote:
    > > > Hi Otto,
    > > > The basic change to use “0.0.0.0” as the default binding, and put the
    > > > square brackets in the template text instead of the parameter value, 
is
    > > now
    > > > available in
    > > > https://github.com/mattf-horton/incubator-metron branch METRON-905
    > > > commit e879719a0c3fb
    > > >
    > > > I’m having some trouble with my test env, so if you wanted to give it
    > > > a try, that would be great.
    > > > If the “0.0.0.0” doesn’t work, then we should use
    > > > "_local_", "_site_"
    > > > that being the ES special values that mean aprx the same.
    > > >
    > > > I’m going to have to do trial-and-error to determine the exact
    > > > behavior of multi-item lists, and then write the python code to strip
    > > > redundant square brackets if included in the parameter value.
    > > > Thanks,
    > > > --Matt
    > > >
    > > >
    > > > On 5/2/17, 6:44 AM, "Otto Fowler" <ottobackwa...@gmail.com<mailto:
    > > > ottobackwa...@gmail.com><mailto:ottobackwa...@gmail.com<mailto:
    > > > ottobackwa...@gmail.com>>> wrote:
    > > >
    > > > I am working on a centos 7 cluster deploy for testing the steps.
    > > > I have this issue ( along with the wrong interface name ) and can
    > > test
    > > > when
    > > > you have it.
    > > >
    > > > An eta would help?
    > > >
    > > >
    > > > On May 2, 2017 at 09:14:10, zeo...@gmail.com (zeo...@gmail.com
    > > <mailto:
    > > > zeo...@gmail.com><mailto:zeo...@gmail.com<mailto:zeo...@gmail.com>>)
    > > > wrote:
    > > >
    > > > Are you working on this one? The JIRA doesn't look like it's
    > > currently
    > > > assigned. Thanks,
    > > >
    > > > Jon
    > > >
    > > > On Mon, May 1, 2017 at 6:40 PM Matt Foley <mfo...@hortonworks.com
    > > > <mailto:mfo...@hortonworks.com><mailto:mfo...@hortonworks.com<mailto:
    > > > mfo...@hortonworks.com>>> wrote:
    > > >
    > > > > Ah, I see I mis-read METRON-897, and Nick specifically says
    > > > > "lo:ipv4","eth0:ipv4" did not work for him, but
    > > > ["_lo:ipv4_","_eth0:ipv4_"]
    > > > > did work.
    > > > >
    > > > > So I went back and dug a little deeper, and realized that in the
    > > > > environment where "lo:ipv4","eth0:ipv4" worked for me, I had
    > > > modified the
    > > > > yaml.j2 template to include the square brackets.
    > > > >
    > > > > So the below theory is wrong. Back to the drawing board.
    > > > > Thanks,
    > > > > --Matt
    > > > >
    > > > > On 5/1/17, 3:08 PM, "Matt Foley" <ma...@apache.org<mailto:
    > > > ma...@apache.org><mailto:ma...@apache.org<mailto:ma...@apache.org>>>
    > > > wrote:
    > > > >
    > > > > Hi, there have been widely varying statements about what needs to
    > > be
    > > > > in the Elasticsearch config parameter “network_host”. I think I may
    > > > have
    > > > a
    > > > > rationale for what works and what doesn’t, but I’d like your input
    > > or
    > > > > correction.
    > > > >
    > > > > I am focusing on what worked in terms of punctuation (quotes and
    > > > > square brackets) with the old _lo:ip4_,_eth0:ip4_. I would like to
    > > > ignore
    > > > > for the moment, please, whether eth0 was the correct name for a
    > > given
    > > > env,
    > > > > and whether we can use 0.0.0.0. Instead, for systems where eth0 WAS
    > > > the
    > > > > correct name, I’d like to understand what worked and why.
    > > > >
    > > > > It’s complicated because the value starts out in xml, is read into
    > > > > python, printed by jinja, then consumed by yaml.
    > > > >
    > > > > I think there were two constructs that actually worked for this
    > > > > param. Please say whether this is consistent or inconsistent with
    > > > your
    > > > > experience:
    > > > >
    > > > > "_lo:ip4_","_eth0:ip4_"
    > > > > This worked for me. I think this was read from XML into python as a
    > > > > list of strings, then output in jinja ‘print statement‘
    > > > > {{ network_host }} as a python literal list with form:
    > > > > [ "_lo:ip4_", "_eth0:ip4_" ]
    > > > > In other words, the print statement for a python list object
    > > injected
    > > > > the needed square brackets.
    > > > >
    > > > > and
    > > > > "[ _lo:ip4_, _eth0:ip4_ ]"
    > > > > Nick and Anand, please confirm if this is the form that worked for
    > > > > you. I think this was read from XML into python as a single string,
    > > > and
    > > > > output in the same jinja print statement as:
    > > > > [ _lo:ip4_, _eth0:ip4_ ]
    > > > > because the print statement for a python string object does not
    > > > > produce quote marks.
    > > > >
    > > > > In either case, yaml (the consumer of the jinja output) saw what it
    > > > > interprets as a list of strings (since quotes are optional for yaml
    > > > > strings).
    > > > >
    > > > > What didn’t work was:
    > > > >
    > > > > * "_lo:ip4_, _eth0:ip4_"
    > > > > This would be read in and output as a single string, and no square
    > > > > brackets would ever be introduced.
    > > > >
    > > > > * _lo:ip4_, _eth0:ip4_ or [ _lo:ip4_, _eth0:ip4_ ]
    > > > > (without quotes) I think the unquoted colons messed up the python
    > > > > parsing
    > > > >
    > > > > Finally, I don’t know whether
    > > > > * [ "_lo:ip4_", "_eth0:ip4_" ]
    > > > > worked or not, I’m not sure anyone ever tried it. By the above
    > > logic
    > > > > it probably should work.
    > > > >
    > > > > Please give me your input if you have touched on these issues.
    > > > > Thanks,
    > > > > --Matt
    > > > >
    > > > >
    > > > >
    > > > >
    > > > >
    > > > >
    > > > > --
    > > >
    > > > Jon
    > > >
    > > >
    > > >
    > > > --
    > >
    > > Jon
    > >
    >
    


Reply via email to