[ 
https://issues.apache.org/jira/browse/NIFI-2585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15763056#comment-15763056
 ] 

ASF GitHub Bot commented on NIFI-2585:
--------------------------------------

GitHub user ijokarumawak opened a pull request:

    https://github.com/apache/nifi/pull/1342

    NIFI-2585 Add attributes to track s2s host and port

    I left the four commits to show the history of how this evolved. We can 
squash Bryan's two commits when merging, but we wanted Randy to get credit for 
starting the work on this ticket.
    
    While reviewing #1320, I found few concerns as shown in the table below.
    Let's say there are `client.nifi` and `server.nifi` transferring files each 
other using S2S. This table shows how `s2s.address` attribute values are set in 
each case with #1320:
    
    | Transfer Protocol | Pulled from a remote OutputPort to S2S client | 
Received via an InputPort at S2S server |
    
|-------------------|-----------------------------------------------|--------------------------------------------|
    | RAW | Server's hostname and port are the one that the client gets when it 
received remote NiFi peers. Those values are defined in server's 
nifi.properties.  <br/> `nifi.remote.input.host:nifi.remote.input.socket.port` 
<br/> e.x. `server.nifi:8081` | Client's hostname and port is used. But due to 
the existing logic which parses URL string and extract hostname and port, if 
the URL string contains illegal character such as underscore, the result 
becomes null, then converted with `unknown`. This happens with Docker as 
container hostname often contain underscore. <br/> e.x. `client.nifi:58034`, 
`unknown:unknown` (if hostname contains underscore) <br/> **FIX 1** <br/> 
`client_nifi:58034` (even if hostname contains underscore)|
    | HTTP | Same as RAW protocol, but uses `nifi.web.http(s).port`. <br/> e.x. 
`server.nifi:8080`| Client hostname and port are retrieved from HTTP request 
object.  However, it returns IP address string representation instead of 
hostname. Probably performance reason. <br/> e.x. `192.168.0.33:59946` <br/> 
**FIX 2** <br/> `client.nifi:59946`.|
    
    ## FIX 1
    
    As reporte in [the previous PR 
comment](https://github.com/apache/nifi/pull/1307#issuecomment-266442657) I got 
'unknown' hostname and port with Docker containers. While this can be handled 
as a corner case since it's not allowed by the URL specification, I think it'd 
be better if we can be lenient here to support Docker environment. I confirmed 
provenance event also failed to resolve hostname and produced provenance 
details with null hostname.
    
    This can be improved by changing Peer's constructor. Currently, it parses 
peerUrl then use its hostname and port, but the same information should be 
retrieved from PeerDescriptor without parsing the URL. Also, 
SocketRemoteSiteListener uses server's hostname and port for PeerDescription, 
but it seems it's not correct, those should be client's.
    
    This commit modifies SocketRemoteSiteListener to use PeerDescriptor instead 
of parsing URL string.
    
    ## FIX 2
    
    This commit modifies DataTransferResource to resolve hostname from IP 
address using InetAddress  when HTTP transport protocol is used, as RAW 
resolves hostname from socket, using InetAddress.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ijokarumawak/nifi nifi-2585

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/nifi/pull/1342.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1342
    
----
commit c6f0621f985b6deddadc616307f192c1ddf1060d
Author: Randy Gelhausen <[email protected]>
Date:   2016-12-07T07:18:09Z

    NIFI-2585: Add attributes to track where a flow file came from when 
receiving over site-to-site

commit 841c4c80e42f09e7dc96552c661dde5a81ab7c08
Author: Bryan Bende <[email protected]>
Date:   2016-12-08T20:17:06Z

    NIFI-2585 Moving attributes into loop in AbstractFlowFileServerProtocol, 
and also updating StandardRemoteGroupPort to apply the same attributes when 
doing a pull-based site-to-site.

commit 0e3a29984d1abf94ba86d98fbe283e9671de1671
Author: Bryan Bende <[email protected]>
Date:   2016-12-12T14:19:22Z

    NIFI-2585 Adding checks in case host and port are not known

commit c206f266c38e537c9434d7ff66ad5e19ceb01e1c
Author: Koji Kawamura <[email protected]>
Date:   2016-12-20T02:19:22Z

    NIFI-2585: Add attributes to track s2s host and port
    
    - Removed host and port field from Peer since the same information is
      available in PeerDescription
    - Refactored variable names in SocketRemoteSiteListener to improve 
readability
    - Changed how SocketRemoteSiteListener constructs PeerDescription
      instance. It used to use hard-coded 'localhost' as hostname, and
      getPort() which returns server's port. Since the peer is a remote peer,
      i.e the client, it should be client hostname and port.
    - Added hostname resolution at DataTransferResource to make s2s.host
      value consistent with RAW transport. Without this, RAW uses hostname
      while HTTP uses IP address. It will be hard to be used from downstream 
flows.

----


> Add attributes to track where a flow file came from when receiving over 
> site-to-site
> ------------------------------------------------------------------------------------
>
>                 Key: NIFI-2585
>                 URL: https://issues.apache.org/jira/browse/NIFI-2585
>             Project: Apache NiFi
>          Issue Type: Improvement
>            Reporter: Bryan Bende
>            Assignee: Randy Gelhausen
>            Priority: Minor
>
> With MiNiFi starting be used to send data to a central NiFi, it would be 
> helpful if information about the sending host and port was added to each flow 
> file received over site-to-site. Currently this information is available and 
> used to generate the transit URI in the RECEIVE event, but this information 
> isn't available to downstream processors that might want to make routing 
> decisions.
> For reference:
> https://github.com/apache/nifi/blob/e23b2356172e128086585fe2c425523c3628d0e7/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-site-to-site/src/main/java/org/apache/nifi/remote/protocol/AbstractFlowFileServerProtocol.java#L452
> A possible approach might be to add two attributes to each flow file, 
> something like "remote.host" and "remote.address" where remote.host has only 
> the sending hostname, and remote.address has the sending host and port.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to