[
https://issues.apache.org/jira/browse/NIFI-2585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15763056#comment-15763056
]
ASF GitHub Bot commented on NIFI-2585:
--------------------------------------
GitHub user ijokarumawak opened a pull request:
https://github.com/apache/nifi/pull/1342
NIFI-2585 Add attributes to track s2s host and port
I left the four commits to show the history of how this evolved. We can
squash Bryan's two commits when merging, but we wanted Randy to get credit for
starting the work on this ticket.
While reviewing #1320, I found few concerns as shown in the table below.
Let's say there are `client.nifi` and `server.nifi` transferring files each
other using S2S. This table shows how `s2s.address` attribute values are set in
each case with #1320:
| Transfer Protocol | Pulled from a remote OutputPort to S2S client |
Received via an InputPort at S2S server |
|-------------------|-----------------------------------------------|--------------------------------------------|
| RAW | Server's hostname and port are the one that the client gets when it
received remote NiFi peers. Those values are defined in server's
nifi.properties. <br/> `nifi.remote.input.host:nifi.remote.input.socket.port`
<br/> e.x. `server.nifi:8081` | Client's hostname and port is used. But due to
the existing logic which parses URL string and extract hostname and port, if
the URL string contains illegal character such as underscore, the result
becomes null, then converted with `unknown`. This happens with Docker as
container hostname often contain underscore. <br/> e.x. `client.nifi:58034`,
`unknown:unknown` (if hostname contains underscore) <br/> **FIX 1** <br/>
`client_nifi:58034` (even if hostname contains underscore)|
| HTTP | Same as RAW protocol, but uses `nifi.web.http(s).port`. <br/> e.x.
`server.nifi:8080`| Client hostname and port are retrieved from HTTP request
object. However, it returns IP address string representation instead of
hostname. Probably performance reason. <br/> e.x. `192.168.0.33:59946` <br/>
**FIX 2** <br/> `client.nifi:59946`.|
## FIX 1
As reporte in [the previous PR
comment](https://github.com/apache/nifi/pull/1307#issuecomment-266442657) I got
'unknown' hostname and port with Docker containers. While this can be handled
as a corner case since it's not allowed by the URL specification, I think it'd
be better if we can be lenient here to support Docker environment. I confirmed
provenance event also failed to resolve hostname and produced provenance
details with null hostname.
This can be improved by changing Peer's constructor. Currently, it parses
peerUrl then use its hostname and port, but the same information should be
retrieved from PeerDescriptor without parsing the URL. Also,
SocketRemoteSiteListener uses server's hostname and port for PeerDescription,
but it seems it's not correct, those should be client's.
This commit modifies SocketRemoteSiteListener to use PeerDescriptor instead
of parsing URL string.
## FIX 2
This commit modifies DataTransferResource to resolve hostname from IP
address using InetAddress when HTTP transport protocol is used, as RAW
resolves hostname from socket, using InetAddress.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/ijokarumawak/nifi nifi-2585
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/nifi/pull/1342.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1342
----
commit c6f0621f985b6deddadc616307f192c1ddf1060d
Author: Randy Gelhausen <[email protected]>
Date: 2016-12-07T07:18:09Z
NIFI-2585: Add attributes to track where a flow file came from when
receiving over site-to-site
commit 841c4c80e42f09e7dc96552c661dde5a81ab7c08
Author: Bryan Bende <[email protected]>
Date: 2016-12-08T20:17:06Z
NIFI-2585 Moving attributes into loop in AbstractFlowFileServerProtocol,
and also updating StandardRemoteGroupPort to apply the same attributes when
doing a pull-based site-to-site.
commit 0e3a29984d1abf94ba86d98fbe283e9671de1671
Author: Bryan Bende <[email protected]>
Date: 2016-12-12T14:19:22Z
NIFI-2585 Adding checks in case host and port are not known
commit c206f266c38e537c9434d7ff66ad5e19ceb01e1c
Author: Koji Kawamura <[email protected]>
Date: 2016-12-20T02:19:22Z
NIFI-2585: Add attributes to track s2s host and port
- Removed host and port field from Peer since the same information is
available in PeerDescription
- Refactored variable names in SocketRemoteSiteListener to improve
readability
- Changed how SocketRemoteSiteListener constructs PeerDescription
instance. It used to use hard-coded 'localhost' as hostname, and
getPort() which returns server's port. Since the peer is a remote peer,
i.e the client, it should be client hostname and port.
- Added hostname resolution at DataTransferResource to make s2s.host
value consistent with RAW transport. Without this, RAW uses hostname
while HTTP uses IP address. It will be hard to be used from downstream
flows.
----
> Add attributes to track where a flow file came from when receiving over
> site-to-site
> ------------------------------------------------------------------------------------
>
> Key: NIFI-2585
> URL: https://issues.apache.org/jira/browse/NIFI-2585
> Project: Apache NiFi
> Issue Type: Improvement
> Reporter: Bryan Bende
> Assignee: Randy Gelhausen
> Priority: Minor
>
> With MiNiFi starting be used to send data to a central NiFi, it would be
> helpful if information about the sending host and port was added to each flow
> file received over site-to-site. Currently this information is available and
> used to generate the transit URI in the RECEIVE event, but this information
> isn't available to downstream processors that might want to make routing
> decisions.
> For reference:
> https://github.com/apache/nifi/blob/e23b2356172e128086585fe2c425523c3628d0e7/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-site-to-site/src/main/java/org/apache/nifi/remote/protocol/AbstractFlowFileServerProtocol.java#L452
> A possible approach might be to add two attributes to each flow file,
> something like "remote.host" and "remote.address" where remote.host has only
> the sending hostname, and remote.address has the sending host and port.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)