Just spitballing a little here. If you set the configuration of the PutTCP
processor property "Connection per Flowfile" to 'true' and you leave the
"Outgoing Message Delimiter" as blank (none), then I don't think you have
the delimiter problem that you both are describing. I could be wrong though?

I would consider it a bug if you couldn't send a "raw" connection-oriented
object over PutTCP.  With that processor, the goal would be to: a) open a
socket, b) dump whatever binary you have prepared over it, c) close the
socket to signal completion of transfer. If PutTCP doesn't work this way
(byte-for-byte), it should probably be flagged as a bug (its original
intention was exactly this use case).

That being said, I still think custom FlowFile serialization might be
something that is outside of the concern of the transport. I personally
think serializing/deserializing is a different concern from transport.
Arguably, sometimes the semantics of the transport protocol requires you to
prepare the message itself in a protocol accommodating way (HTTP being an
obvious example of this, or packet ordering in Marc's UDP example). But a
new JSON flowfile serialization seems like it could be a separate
processor, not commingled into an existing one.

MergeContent / UnpackContent work in tandem and have a "FlowFile Stream v3"
format that can serialize/deserialize multiple flowfiles together into a
single byte stream. This allows transport over any protocol, including
file-based, socket-based, etc.

Marc: Your mention of performance is, of course, appropriate for the scale
that you're talking about (Gbps). Maybe there's some performance
improvements that could be garnered from your work applicable to the
"standard" processors I mentioned. And I definitely didn't mean to imply
you were doing "anything wrong". Just legitimately curious as to your
thought process and design approach.

OK, I'll step off a little, because I might be probing too hard here. But I
was legitimately curious about the intention of the proposed processor as
it relates to the mentioned Diode device.

Thanks,

Adam


On Mon, Aug 2, 2021 at 4:15 PM Phil H <gippyp...@gmail.com> wrote:

> Hi Marc,
>
> Thanks for the additional info.  Just so you know you’re not the only
> one, I’ve also had to re-implement a ListenTCP alternative to get
> around the byte delimeter issue for binary and multiline text data.
>
> Phil
>
>
> On Tue, Aug 3, 2021 at 6:59 AM Marc <n...@nerdfunk.net> wrote:
> >
> > Hi Adam,
> >
> > more or less it is a ‚merge', puttcp, listentcp and unpack. I hope that
> I am not wrong but the nifi ListenTCP processor uses a delimiter (\n as
> default?). If you are transferring binary data the processor splits the
> flow into ‚pieces'. And the attributes are not transferred to the
> destination.
> >
> > But your idea describes what the processor is doing.
> >
> > 1. It converts the attributes to a json string
> > 2. It transfers the json string and the payload (there is a header that
> tells the destination how long the json header and how long the payload is)
> > 3. The Listener gets the flow and decodes the header (to get the size of
> the json header and the payload)
> > 4. It writes the payload to a flow
> > 5. It converts the json string and sets the attributes to the flow
> >
> > If you do not want to transfer attributes you can configure a different
> decoder. In this case you can just ‚nectat‘ a binary file to nifi.
> >
> > The UDP version is far more complex. There must be a counter to tell the
> destination what part of the flow file was received (even in a diode
> environment packets are not received in the right order!). And you must be
> fast, very fast. It is a multithreaded architecture because one thread
> cannot receive, decode, and write a gigabit per second. I used the
> disruptor library. Receive a packet in one thread, decode it in another
> thread. A third thread gets the packet and write the content in the right
> order to a flow.
> >
> > I am still learning (and I am not a professional software developer). If
> I did something wrong or oversaw something please tell me.
> >
> > Marc
> >
> > > Am 02.08.2021 um 22:01 schrieb Adam Taft <a...@adamtaft.com>:
> > >
> > > Marc,
> > >
> > > How would this differ from a more generic use of the existing
> processors,
> > > PutTCP/ListentTCP and PutUDP/ListenUDP?  I'm not sure what value is
> being
> > > added above these existing processors, but I'm sure I'm missing
> something.
> > >
> > > There's already an ability to serialize flowfiles via MergeContent. And
> > > there's the deserialize side in UnpackContent. So a dataflow that looks
> > > like the following would seem a reasonable approach to the problem:
> > >
> > > MergeContent -> PutTCP -> {diode} -> ListentTCP -> UnpackContent
> > >
> > > I'm actually very interested in this topic, having a project that has
> a use
> > > case for a "diode". So I'm legitimately asking here, not trying to
> derail
> > > your work.
> > >
> > > Thanks in advance,
> > >
> > > Adam
> > >
> > > On Sun, Aug 1, 2021 at 12:26 PM Marc <n...@nerdfunk.net> wrote:
> > >
> > >> Greetings,
> > >>
> > >> there are companies and organizations that strictly separate their
> > >> networks for security reasons. Such companies often use diodes to
> achieve
> > >> this. But of course they still have to exchange data between the
> networks
> > >> (eg. transfer data from ‚low‘ to ‚high‘). There are at least two
> kinds of
> > >> diodes. Some hardware-based ones only use one fiber optic to send
> data (UDP
> > >> based). Others use TCP, but prevent sending in the reverse direction.
> > >>
> > >> Nifi is an amazing tool that allows data to be transferred between two
> > >> separate networks in a very flexible but also secure way. I have
> > >> implemented two processors. The first one ‚merges‘ the attributes and
> the
> > >> content of a flowfile and sends it to the destination. The second one
> > >> listens on a TCP port, splits attributes and content and creates a new
> > >> flowfile containing all attributes of the origin flow. You can send
> the
> > >> flow without attributes as well. In this case you can easily netcat a
> > >> binary file to Nifi.
> > >>
> > >> These two processors are useful if you do NOT have a bidirectional
> > >> communication between two NiFi instances and therefore the site-2-site
> > >> mechanism or http(s) cannot be used.
> > >>
> > >> We have been using these processors for a longer period of time
> (exactly
> > >> the version for 1.13.2) and would like to share these processors with
> > >> others. So the question to you all is: Is someone interested in these
> > >> processors or is this use case too special?
> > >>
> > >> The current source code can be found on GitHub. (
> > >> https://github.com/nerdfunk-net/diode/ <
> > >> https://github.com/nerdfunk-net/diode/>)
> > >>
> > >> I have also implemented a UDP based version of the processor. Due to
> the
> > >> nature of UDP, this is more complex and these processors are now being
> > >> tested.
> > >>
> > >> Best regards
> > >> Marc
> >
>

Reply via email to