I think it would be very valuable to split out publish_host and bind_host,
but I think it would make sense to do that work as part of a separate JIRA.


I would like to see us in a position to ship 0.4.0 as quickly as possible. Is
there a quick solution that reasonable balances security and working
out-of-the-box here, Matt?










On Wed, May 3, 2017 at 8:30 AM, Nick Allen <n...@nickallen.org> wrote:

> It only worked "good enough" on Ansible because it was mainly used for
> deploying to a controlled environment where we know the interface names;
> aka Vagrant/Single Node.
>
> It did not work well at all on environments other than Vagrant/Single
> Node.  The work that was done with Elasticsearch and Ambari gives us
> significantly more functionality.
>
> The issue now is in getting this to work safely, out-of-the-box on a much
> wider range of platforms; especially ones which will have different network
> setups.​
>
> And for the record, in Ansible it simply defaulted to eth0
>
>    - elasticsearch_network_interface: eth0
>    
> <https://github.com/apache/incubator-metron/blob/Metron_0.3.1/metron-deployment/roles/elasticsearch/defaults/main.yml#L19>
>    - 'network.host: ["_{{ elasticsearch_network_interface
>    }}:ipv4_","_local:ipv4_"]
>    
> <https://github.com/apache/incubator-metron/blob/Metron_0.3.1/metron-deployment/roles/elasticsearch/tasks/elasticsearch.yml#L69>
>
>
>
>
> On Wed, May 3, 2017 at 7:56 AM, Otto Fowler <ottobackwa...@gmail.com>
> wrote:
>
>> How is the ambari service install configuration different from prior
>> configuration through ansible?
>> This used to work better right?
>>
>>
>> On May 3, 2017 at 07:06:52, zeo...@gmail.com (zeo...@gmail.com) wrote:
>>
>> Thanks for the good write up Matt.  Here are my thoughts:
>>
>> D1: I don't see a way to have a default that works in every scenario.
>> Documenting this and setting a sane default that works most of the time is
>> probably the best path forward.
>>
>> D2: If we use _local_ and _site_, shouldn't it prioritize site for
>> publishing, like we want?  I guess if you have multiple interfaces that
>> fit
>> in site it is not super obvious to an end user which will be specified,
>> although it is programmatic like you mentioned above.  Are we specifically
>> trying to bind to a global IP?
>>
>> To reinforce my prior comment, as a system owner who has publicly
>> addressable IPs on systems, I do NOT want _global_ included by default,
>> and
>> thus would strongly deter from using 0.0.0.0 as well.  This is asking for
>> trouble.
>>
>> D3: To avoid confusion, I think ES should be configured like ES, and vice
>> versa.  Think of people who have well tuned ES systems and want to port
>> their configs into Metron.
>>
>> Another thought - is this handled better if we upgrade ES?  Afaik we don't
>> really depend on ES for much, and an upgrade has other benefits, among
>> those being able to natively support periods in field names[1].  I am
>> doubtful this will resolve any of our concerns but figured I'd mention it
>> anyway.
>>
>> In a separate ES related JIRA I'm working on, I will either need to de_dot
>> bro fields in the parser, force the transformation in the Kafka plugin
>> (not
>> preferred), provide an example of how to do this in bro configs (not very
>> obvious to those new to bro/es), give an example of transforming in
>> stellar, or upgrade ES.  I'm leaning towards upgrading ES to 2.4 at least,
>> if not 5.x.
>>
>> 1:.
>> https://www.elastic.co/guide/en/elasticsearch/reference/2.4/
>> dots-in-names.html
>>
>> Jon
>>
>> On Wed, May 3, 2017, 1:50 AM Matt Foley <ma...@apache.org> wrote:
>>
>> > Okay, several items that merit discussion:
>> >
>> > Fact A. Experiment shows that the contents of the <value> fields in
>> > elastic-site.xml, and hence the values in Ambari GUI config fields, are
>> > just used as big unquoted Unicode character sequences, including any
>> quote
>> > marks, square brackets or other punctuation, until they are written into
>> > the yaml.j2 template by the {{ }} operator.  Thus, the value:
>> >     ["_eth0_","_lo_"]
>> > is a 16-character Unicode string.  Yaml, of course, actually parses the
>> > result.
>> > This is actually nice, it makes it easy to understand and manipulate the
>> > textual content of the field.
>> >
>> > Fact B. In the Hadoop world, config parameters that are lists, are
>> usually
>> > single strings containing a sequence of unquoted comma-delimited
>> substrings
>> > with no blank spaces.  The substring elements of the list are forbidden
>> to
>> > have commas or anything else that would disrupt fairly obvious parsing.
>> > Parsing is done by apache commons code or plain old Java.  Users are
>> USED
>> > to working with these kinds of config params in Ambari.
>> >
>> > But in Elasticsearch, and some other Metron components, the parsing is
>> > done by Yaml.  This means:
>> > -    To be a list, square brackets must be provided – either in the
>> value,
>> > the python processing, or the template.  If only one value is provided
>> it
>> > does not have to be in a list.
>> > -    List elements want to be delimited by comma-space, not just comma
>> > (although it’s not clear whether this actually causes errors with
>> > non-numeric list elements)
>> > -    Quote marks around string list elements are optional except when
>> > necessary.  This greatly increases the opportunity for confusion and
>> error.
>> > -    Colon is a special character (related to dictionary parsing), so if
>> > you need a colon in a string, the string needs quote marks.  “_local_”
>> > doesn’t need quote marks; “_local:ipv4_” does require quote marks.
>> > Character sequences that would mis-parse as poorly formed numbers also
>> need
>> > quote marks: “0.0.0.0”.
>> >
>> > Fact C. The “network.host” Elasticsearch parameter is a cheat, both way
>> > more powerful and way more limited than one might expect.
>> > It is a cheat because it masks two underlying parameters:
>> > network.bind_host and network.publish_host.  This is all documented at
>> > https://www.elastic.co/guide/en/elasticsearch/reference/2.3/
>> modules-network.html
>> > and implemented in
>> > https://github.com/elastic/elasticsearch/blob/2.3/core/src/
>> main/java/org/elasticsearch/common/network/NetworkService.java
>> > (methods resolveBindHostAddresses() and resolvePublishHostAddresses()).
>> > -    network.bind_host is the set of addresses Elasticsearch “bind to”
>> > (listens on). Supposedly it will actually bind to multiple network
>> > addresses if available and specified.  Whatever set of specifiers you
>> gave
>> > network.host get expanded into a list of actual bind addresses.  If you
>> > give it the wildcard value (“0.0.0.0” for ipv4), it will bind to all
>> > available addresses.
>> > -    network.publish_host is the address Elasticsearch “publishes” for
>> > clients and other servers to connect to. It will publish only one
>> address.
>> > If you give it a set of addresses, it picks the most “desirable” of the
>> set
>> > – it assures it actually is accessible, and it prefers ipv4 (or 6,
>> > depending on another config), then  global, then site-local, then
>> > link-local, then loopback. Within each category it orders by numeric
>> > magnitude of the IP address, which is hardly meaningful.  This means the
>> > published address can be wrong on a multi-homed server or VM, if you
>> don’t
>> > appropriately constrain it.
>> > -    The parameter values can be network addresses, network interface
>> > names, host names (to be dereferenced via DNS), “special” names denoting
>> > predefined sets of addresses, and combinations of the above.
>> > -    Wildcard and loopback addresses are allowed.
>> > -    If the wildcard is provided it must be the ONLY value provided
>> (list
>> > of length == 1), or ES will throw an error.
>> >
>> > Discussion item 1:  If you use network.host, the same list of addresses
>> > get sent to both network.bind_host and network.publish_host.  The
>> algorithm
>> > for picking the single publish_host address is not good enough, at
>> least in
>> > ES 2.3, to give certainty that the right address will be published, on
>> > multi-homed servers or VMs (although on non-multi-homed, it should
>> > generally work fine).
>> >
>> > It seems to me that specifying exactly one of _local_, _site_, or
>> _global_
>> > will usually give the right result, but that too can fail if the server
>> has
>> > multiple addresses within the same category.
>> >
>> > I think network.bind_host and network.publish_host should be separately
>> > configured, as they are with Hadoop.
>> > There’s an article here:
>> > https://community.hortonworks.com/content/kbentry/24277/para
>> meters-for-multi-homing.html
>> > that discusses these issues at some length, and clarifies why they must
>> be
>> > separately configured.
>> >
>> > What do you-all think?
>> >
>> > Discussion item 2:  While it’s fine to use 0.0.0.0 for the bind address,
>> > it gives no guidance at all to the needed publish_host value. Using
>> _local_
>> > for QuickDev and single-node deployments, and _site_  for FullDev
>> > deployments and all cluster deployments, is probably a reasonable choice
>> > for publish_host.
>> >
>> > What do you-all think?
>> >
>> > Discussion item 3: Should we attempt to further the “hadoop style” of
>> > config parameter, and silently add the square brackets and perhaps
>> > substring quotes in python processing?  Or should we say users need to
>> > understand ES configuration, and tell them to put the list in square
>> > brackets themselves, if they need a list entry in this parameter, per
>> > https://www.elastic.co/guide/en/elasticsearch/reference/2.3/
>> modules-network.html
>> > ?
>> >
>> > Please share your thoughts,
>> > Thanks,
>> > --Matt
>> >
>> >
>> > On 5/2/17, 9:57 PM, "Matt Foley" <mfo...@hortonworks.com> wrote:
>> >
>> >     Hi Otto,
>> >     This event derives from this line of code:
>> > https://github.com/elastic/elasticsearch/blob/2.3/core/src/
>> main/java/org/elasticsearch/action/support/master/Transpor
>> tMasterNodeAction.java#L148
>> >     which suggests that a cluster action has been requested on a local
>> > (loopback) address.  This is not
>> >     surprising given what I’ve learned about the semantics of
>> network.host
>> > with wildcard address.
>> >     See next message, item C.  Basically, while the wildcard causes ES
>> to
>> > “listen” on all IP addresses, it
>> >     only *publishes* one, and on a multi-homed server it can be the
>> wrong
>> > one.  I can’t be certain
>> >     this causes what you’re seeing, but it seems feasible.
>> >
>> >     From: Otto Fowler <ottobackwa...@gmail.com>
>> >     Date: Tuesday, May 2, 2017 at 8:30 PM
>> >     To: "d...@metron.incubator.apache.org" <
>> d...@metron.incubator.apache.org>,
>> > Matt Foley <mfo...@hortonworks.com>, "dev@metron.apache.org" <
>> > dev@metron.apache.org>, "zeo...@gmail.com" <zeo...@gmail.com>
>> >     Subject: Re: Request double-check on Ambari config logic (ES
>> > network_host)
>> >
>> >     OK.
>> >     I tried it using this method, and master ( adding [] ).  In both
>> > cases, I can hit 9200 from other machines, but in both cases I’m
>> getting ES
>> > master errors:
>> >
>> >     ClusterBlockException[blocked by: [SERVICE_UNAVAILABLE/1/state not
>> > recovered / initialized];]
>> >     at
>> > org.elasticsearch.cluster.block.ClusterBlocks.indexBlockedEx
>> ception(ClusterBlocks.java:174)
>> >     at
>> > org.elasticsearch.action.admin.indices.create.TransportCreat
>> eIndexAction.checkBlock(TransportCreateIndexAction.java:66)
>> >     at
>> > org.elasticsearch.action.admin.indices.create.TransportCreat
>> eIndexAction.checkBlock(TransportCreateIndexAction.java:41)
>> >     at
>> > org.elasticsearch.action.support.master.TransportMasterNodeA
>> ction$AsyncSingleAction.doStart(TransportMasterNodeAction.java:148)
>> >     at
>> > org.elasticsearch.action.support.master.TransportMasterNodeA
>> ction$AsyncSingleAction.start(TransportMasterNodeAction.java:140)
>> >     at
>> > org.elasticsearch.action.support.master.TransportMasterNodeA
>> ction.doExecute(TransportMasterNodeAction.java:107)
>> >     at
>> > org.elasticsearch.action.support.master.TransportMasterNodeA
>> ction.doExecute(TransportMasterNodeAction.java:51)
>> >     at
>> > org.elasticsearch.action.support.TransportAction.execute(
>> TransportAction.java:137)
>> >     at
>> > org.elasticsearch.action.index.TransportIndexAction.doExecut
>> e(TransportIndexAction.java:98)
>> >     at
>> > org.elasticsearch.action.index.TransportIndexAction.doExecut
>> e(TransportIndexAction.java:66)
>> >     at
>> > org.elasticsearch.action.support.TransportAction.execute(
>> TransportAction.java:137)
>> >     at
>> > org.elasticsearch.action.support.TransportAction.execute(
>> TransportAction.java:85)
>> >     at
>> > org.elasticsearch.client.node.NodeClient.doExecute(NodeClient.java:58)
>> >     at
>> > org.elasticsearch.client.support.AbstractClient.execute(
>> AbstractClient.java:359)
>> >     at
>> > org.elasticsearch.client.FilterClient.doExecute(FilterClient.java:52)
>> >     at
>> > org.elasticsearch.rest.BaseRestHandler$HeadersAndContextCopy
>> Client.doExecute(BaseRestHandler.java:83)
>> >     at
>> > org.elasticsearch.client.support.AbstractClient.execute(
>> AbstractClient.java:359)
>> >     at
>> > org.elasticsearch.client.support.AbstractClient.index(Abstra
>> ctClient.java:371)
>> >     at
>> > org.elasticsearch.rest.action.index.RestIndexAction.handleRe
>> quest(RestIndexAction.java:102)
>> >     at
>> > org.elasticsearch.rest.BaseRestHandler.handleRequest(BaseRes
>> tHandler.java:54)
>> >     at
>> > org.elasticsearch.rest.RestController.executeHandler(RestCon
>> troller.java:205)
>> >     at
>> > org.elasticsearch.rest.RestController.dispatchRequest(RestCo
>> ntroller.java:166)
>> >     at
>> > org.elasticsearch.http.HttpServer.internalDispatchRequest(Ht
>> tpServer.java:128)
>> >     at
>> > org.elasticsearch.http.HttpServer$Dispatcher.dispatchRequest
>> (HttpServer.java:86)
>> >     at
>> > org.elasticsearch.http.netty.NettyHttpServerTransport.dispat
>> chRequest(NettyHttpServ
>> >
>> >     and kibana is not good.
>> >
>> >     not sure what that error means.
>> >     I have 5 nodes, and put es master on #5, with #3,4 as datanodes.
>> >
>> >     Sorry, but I don’t think my setup is going to be much help at this
>> > point.
>> >
>> >
>> >
>> >
>> >     On May 2, 2017 at 17:19:43, Matt Foley (mfo...@hortonworks.com
>> <mailto:
>> > mfo...@hortonworks.com>) wrote:
>> >     The default will now be “0.0.0.0”, and not eth0. And this will work
>> if
>> > suggestions from various community members and a suggestion in the old
>> 1.x
>> > documentation for ES are correct. The 2.x documentation (we specify ES
>> 2.3)
>> > doesn’t mention “0.0.0.0”, but I think it’s likely to still work, but it
>> > needs testing.
>> >
>> >     Thanks,
>> >     --Matt
>> >
>> >     From: Otto Fowler <ottobackwa...@gmail.com<mailto:
>> > ottobackwa...@gmail.com>>
>> >     Date: Tuesday, May 2, 2017 at 11:27 AM
>> >     To: "d...@metron.incubator.apache.org<mailto:
>> > d...@metron.incubator.apache.org>" <d...@metron.incubator.apache.org
>> <mailto:
>> > d...@metron.incubator.apache.org>>, Matt Foley <mfo...@hortonworks.com
>> > <mailto:mfo...@hortonworks.com>>, "dev@metron.apache.org<mailto:
>> > dev@metron.apache.org>" <dev@metron.apache.org<mailto:
>> > dev@metron.apache.org>>, "zeo...@gmail.com" <zeo...@gmail.com<mailto:
>> > zeo...@gmail.com>>
>> >     Subject: Re: Request double-check on Ambari config logic (ES
>> > network_host)
>> >
>> >     Are you saying that the defaults should work now?
>> >     Or they should work, but I still need to change the interface from
>> > eth0?
>> >
>> >
>> >
>> >
>> >     On May 2, 2017 at 13:36:11, Matt Foley (mfo...@hortonworks.com
>> <mailto:
>> > mfo...@hortonworks.com><mailto:mfo...@hortonworks.com<mailto:
>> > mfo...@hortonworks.com>>) wrote:
>> >     Hi Otto,
>> >     The basic change to use “0.0.0.0” as the default binding, and put
>> the
>> > square brackets in the template text instead of the parameter value, is
>> now
>> > available in
>> >     https://github.com/mattf-horton/incubator-metron branch METRON-905
>> > commit e879719a0c3fb
>> >
>> >     I’m having some trouble with my test env, so if you wanted to give
>> it
>> > a try, that would be great.
>> >     If the “0.0.0.0” doesn’t work, then we should use
>> >     "_local_", "_site_"
>> >     that being the ES special values that mean aprx the same.
>> >
>> >     I’m going to have to do trial-and-error to determine the exact
>> > behavior of multi-item lists, and then write the python code to strip
>> > redundant square brackets if included in the parameter value.
>> >     Thanks,
>> >     --Matt
>> >
>> >
>> >     On 5/2/17, 6:44 AM, "Otto Fowler" <ottobackwa...@gmail.com<mailto:
>> > ottobackwa...@gmail.com><mailto:ottobackwa...@gmail.com<mailto:
>> > ottobackwa...@gmail.com>>> wrote:
>> >
>> >     I am working on a centos 7 cluster deploy for testing the steps.
>> >     I have this issue ( along with the wrong interface name ) and can
>> test
>> > when
>> >     you have it.
>> >
>> >     An eta would help?
>> >
>> >
>> >     On May 2, 2017 at 09:14:10, zeo...@gmail.com (zeo...@gmail.com
>> <mailto:
>> > zeo...@gmail.com><mailto:zeo...@gmail.com<mailto:zeo...@gmail.com>>)
>> > wrote:
>> >
>> >     Are you working on this one? The JIRA doesn't look like it's
>> currently
>> >     assigned. Thanks,
>> >
>> >     Jon
>> >
>> >     On Mon, May 1, 2017 at 6:40 PM Matt Foley <mfo...@hortonworks.com
>> > <mailto:mfo...@hortonworks.com><mailto:mfo...@hortonworks.com<mailto:
>> > mfo...@hortonworks.com>>> wrote:
>> >
>> >     > Ah, I see I mis-read METRON-897, and Nick specifically says
>> >     > "lo:ipv4","eth0:ipv4" did not work for him, but
>> >     ["_lo:ipv4_","_eth0:ipv4_"]
>> >     > did work.
>> >     >
>> >     > So I went back and dug a little deeper, and realized that in the
>> >     > environment where "lo:ipv4","eth0:ipv4" worked for me, I had
>> > modified the
>> >     > yaml.j2 template to include the square brackets.
>> >     >
>> >     > So the below theory is wrong. Back to the drawing board.
>> >     > Thanks,
>> >     > --Matt
>> >     >
>> >     > On 5/1/17, 3:08 PM, "Matt Foley" <ma...@apache.org<mailto:
>> > ma...@apache.org><mailto:ma...@apache.org<mailto:ma...@apache.org>>>
>> > wrote:
>> >     >
>> >     > Hi, there have been widely varying statements about what needs to
>> be
>> >     > in the Elasticsearch config parameter “network_host”. I think I
>> may
>> > have
>> >     a
>> >     > rationale for what works and what doesn’t, but I’d like your
>> input or
>> >     > correction.
>> >     >
>> >     > I am focusing on what worked in terms of punctuation (quotes and
>> >     > square brackets) with the old _lo:ip4_,_eth0:ip4_. I would like to
>> > ignore
>> >     > for the moment, please, whether eth0 was the correct name for a
>> given
>> >     env,
>> >     > and whether we can use 0.0.0.0. Instead, for systems where eth0
>> WAS
>> > the
>> >     > correct name, I’d like to understand what worked and why.
>> >     >
>> >     > It’s complicated because the value starts out in xml, is read into
>> >     > python, printed by jinja, then consumed by yaml.
>> >     >
>> >     > I think there were two constructs that actually worked for this
>> >     > param. Please say whether this is consistent or inconsistent with
>> > your
>> >     > experience:
>> >     >
>> >     > "_lo:ip4_","_eth0:ip4_"
>> >     > This worked for me. I think this was read from XML into python as
>> a
>> >     > list of strings, then output in jinja ‘print statement‘
>> >     > {{ network_host }} as a python literal list with form:
>> >     > [ "_lo:ip4_", "_eth0:ip4_" ]
>> >     > In other words, the print statement for a python list object
>> injected
>> >     > the needed square brackets.
>> >     >
>> >     > and
>> >     > "[ _lo:ip4_, _eth0:ip4_ ]"
>> >     > Nick and Anand, please confirm if this is the form that worked for
>> >     > you. I think this was read from XML into python as a single
>> string,
>> > and
>> >     > output in the same jinja print statement as:
>> >     > [ _lo:ip4_, _eth0:ip4_ ]
>> >     > because the print statement for a python string object does not
>> >     > produce quote marks.
>> >     >
>> >     > In either case, yaml (the consumer of the jinja output) saw what
>> it
>> >     > interprets as a list of strings (since quotes are optional for
>> yaml
>> >     > strings).
>> >     >
>> >     > What didn’t work was:
>> >     >
>> >     > * "_lo:ip4_, _eth0:ip4_"
>> >     > This would be read in and output as a single string, and no square
>> >     > brackets would ever be introduced.
>> >     >
>> >     > * _lo:ip4_, _eth0:ip4_ or [ _lo:ip4_, _eth0:ip4_ ]
>> >     > (without quotes) I think the unquoted colons messed up the python
>> >     > parsing
>> >     >
>> >     > Finally, I don’t know whether
>> >     > * [ "_lo:ip4_", "_eth0:ip4_" ]
>> >     > worked or not, I’m not sure anyone ever tried it. By the above
>> logic
>> >     > it probably should work.
>> >     >
>> >     > Please give me your input if you have touched on these issues.
>> >     > Thanks,
>> >     > --Matt
>> >     >
>> >     >
>> >     >
>> >     >
>> >     >
>> >     >
>> >     > --
>> >
>> >     Jon
>> >
>> >
>> >
>> > --
>>
>> Jon
>>
>
>

Reply via email to