Hi Daniel,

we tested two different diodes (two different manufacturer) for a longer time. 
Both diodes count the UDP packets and log any loss. Additionally we are using 
hashes to detect any form of content errors. We have tested the two diodes for 
over a year and have not seen any packet loss during this time. Both diodes 
were shipped as appliances - the manufacturer therefore knows exactly the 
hardware. 

This is also the reason why I initially decided against using a reed-solomon 
codes (or OpenRQ as another example). The effort seems greater than the 
benefit. But maybe I'll have more time to look at this later. Especially 
because I want to implement a diode based on nifi and therefore I don't know 
the hardware used.

If you are interested in a very reliable diode then another concept would be 
interesting for you. A German manufacturer uses a L4 Microkernel (similar to 
the seL4 microkernel) to prevent data being sent back. Only acks without 
payload are allowed. This works like a charm and is very reliable.

Thanks again for your interest in my work. I appreciate that.

Marc

> Am 03.08.2021 um 12:33 schrieb Daniel Chaffelson <chaffel...@gmail.com>:
> 
> This is a very interesting area of integration investigation Marc, thank
> you for sharing your work!
> 
> I looked into this a little after conversations with folks in security
> applications, and I wonder if you investigated approaches to tracking and
> reporting/handling packet loss and error rates in this?
> The interest was in reasoning about loss rates, and the completeness of
> received data - something with a simple merge>put>diode>get>unpack would
> not manage I think.
> 
> I was looking at Longhair <https://github.com/catid/longhair>, and similar
> reed-solomon approaches, as a method of breaking down arbitrary files and
> transmitting for reconstitution over diodes that may have lossy behavior in
> Field scenarios.
> I also looked a little into transmitting manifests for downstream
> reconciliation, but this unravelled to be more complex an operation than
> would suit a pure NiFi implementation, so I started on the path of
> Kafka/Flink as a streaming-reconciliation service but quickly realised i
> was creating a monster without commercial interest :)
> Both approaches are easier for fewer larger files than millions of tiny
> messages in terms of practicality, and if you had very reliable diode
> transmission the overhead of ecc/reconciliation may not be worthwhile.
> Other implementations I had seen (like ZeroMQ radio/dish or blindFTP)
> seemed to talk about provable delivery as a potential requirement, but I
> only found the more simplistic 'my network is reliable and any packet loss
> is negligible anyway' approaches. I suspect the implementations of these
> more robust approaches are reserved for commercial offerings...
> 
> Anyway, I appreciate that you may not be able to share more details on
> this, but you reminded me of enjoying the investigation when I looked at it
> so I thought I'd say thanks for that.
> 
> On Tue, Aug 3, 2021 at 2:55 AM Phil H <gippyp...@gmail.com> wrote:
> 
>> Adam, that's true, although if your data size is larger than network
>> MTU there can be some disconnect there.
>> 
>> Connection per flow file is pretty slow for sustained high traffic
>> flows though (can't recall the establishment times off the top of my
>> head, but they are non-trivial).
>> 
>> On Tue, Aug 3, 2021 at 8:39 AM Adam Taft <a...@adamtaft.com> wrote:
>>> 
>>> Just spitballing a little here. If you set the configuration of the
>> PutTCP
>>> processor property "Connection per Flowfile" to 'true' and you leave the
>>> "Outgoing Message Delimiter" as blank (none), then I don't think you have
>>> the delimiter problem that you both are describing. I could be wrong
>> though?
>>> 
>>> I would consider it a bug if you couldn't send a "raw"
>> connection-oriented
>>> object over PutTCP.  With that processor, the goal would be to: a) open a
>>> socket, b) dump whatever binary you have prepared over it, c) close the
>>> socket to signal completion of transfer. If PutTCP doesn't work this way
>>> (byte-for-byte), it should probably be flagged as a bug (its original
>>> intention was exactly this use case).
>>> 
>>> That being said, I still think custom FlowFile serialization might be
>>> something that is outside of the concern of the transport. I personally
>>> think serializing/deserializing is a different concern from transport.
>>> Arguably, sometimes the semantics of the transport protocol requires you
>> to
>>> prepare the message itself in a protocol accommodating way (HTTP being an
>>> obvious example of this, or packet ordering in Marc's UDP example). But a
>>> new JSON flowfile serialization seems like it could be a separate
>>> processor, not commingled into an existing one.
>>> 
>>> MergeContent / UnpackContent work in tandem and have a "FlowFile Stream
>> v3"
>>> format that can serialize/deserialize multiple flowfiles together into a
>>> single byte stream. This allows transport over any protocol, including
>>> file-based, socket-based, etc.
>>> 
>>> Marc: Your mention of performance is, of course, appropriate for the
>> scale
>>> that you're talking about (Gbps). Maybe there's some performance
>>> improvements that could be garnered from your work applicable to the
>>> "standard" processors I mentioned. And I definitely didn't mean to imply
>>> you were doing "anything wrong". Just legitimately curious as to your
>>> thought process and design approach.
>>> 
>>> OK, I'll step off a little, because I might be probing too hard here.
>> But I
>>> was legitimately curious about the intention of the proposed processor as
>>> it relates to the mentioned Diode device.
>>> 
>>> Thanks,
>>> 
>>> Adam
>>> 
>>> 
>>> On Mon, Aug 2, 2021 at 4:15 PM Phil H <gippyp...@gmail.com> wrote:
>>> 
>>>> Hi Marc,
>>>> 
>>>> Thanks for the additional info.  Just so you know you’re not the only
>>>> one, I’ve also had to re-implement a ListenTCP alternative to get
>>>> around the byte delimeter issue for binary and multiline text data.
>>>> 
>>>> Phil
>>>> 
>>>> 
>>>> On Tue, Aug 3, 2021 at 6:59 AM Marc <n...@nerdfunk.net> wrote:
>>>>> 
>>>>> Hi Adam,
>>>>> 
>>>>> more or less it is a ‚merge', puttcp, listentcp and unpack. I hope
>> that
>>>> I am not wrong but the nifi ListenTCP processor uses a delimiter (\n as
>>>> default?). If you are transferring binary data the processor splits the
>>>> flow into ‚pieces'. And the attributes are not transferred to the
>>>> destination.
>>>>> 
>>>>> But your idea describes what the processor is doing.
>>>>> 
>>>>> 1. It converts the attributes to a json string
>>>>> 2. It transfers the json string and the payload (there is a header
>> that
>>>> tells the destination how long the json header and how long the
>> payload is)
>>>>> 3. The Listener gets the flow and decodes the header (to get the
>> size of
>>>> the json header and the payload)
>>>>> 4. It writes the payload to a flow
>>>>> 5. It converts the json string and sets the attributes to the flow
>>>>> 
>>>>> If you do not want to transfer attributes you can configure a
>> different
>>>> decoder. In this case you can just ‚nectat‘ a binary file to nifi.
>>>>> 
>>>>> The UDP version is far more complex. There must be a counter to tell
>> the
>>>> destination what part of the flow file was received (even in a diode
>>>> environment packets are not received in the right order!). And you
>> must be
>>>> fast, very fast. It is a multithreaded architecture because one thread
>>>> cannot receive, decode, and write a gigabit per second. I used the
>>>> disruptor library. Receive a packet in one thread, decode it in another
>>>> thread. A third thread gets the packet and write the content in the
>> right
>>>> order to a flow.
>>>>> 
>>>>> I am still learning (and I am not a professional software
>> developer). If
>>>> I did something wrong or oversaw something please tell me.
>>>>> 
>>>>> Marc
>>>>> 
>>>>>> Am 02.08.2021 um 22:01 schrieb Adam Taft <a...@adamtaft.com>:
>>>>>> 
>>>>>> Marc,
>>>>>> 
>>>>>> How would this differ from a more generic use of the existing
>>>> processors,
>>>>>> PutTCP/ListentTCP and PutUDP/ListenUDP?  I'm not sure what value is
>>>> being
>>>>>> added above these existing processors, but I'm sure I'm missing
>>>> something.
>>>>>> 
>>>>>> There's already an ability to serialize flowfiles via
>> MergeContent. And
>>>>>> there's the deserialize side in UnpackContent. So a dataflow that
>> looks
>>>>>> like the following would seem a reasonable approach to the problem:
>>>>>> 
>>>>>> MergeContent -> PutTCP -> {diode} -> ListentTCP -> UnpackContent
>>>>>> 
>>>>>> I'm actually very interested in this topic, having a project that
>> has
>>>> a use
>>>>>> case for a "diode". So I'm legitimately asking here, not trying to
>>>> derail
>>>>>> your work.
>>>>>> 
>>>>>> Thanks in advance,
>>>>>> 
>>>>>> Adam
>>>>>> 
>>>>>> On Sun, Aug 1, 2021 at 12:26 PM Marc <n...@nerdfunk.net> wrote:
>>>>>> 
>>>>>>> Greetings,
>>>>>>> 
>>>>>>> there are companies and organizations that strictly separate their
>>>>>>> networks for security reasons. Such companies often use diodes to
>>>> achieve
>>>>>>> this. But of course they still have to exchange data between the
>>>> networks
>>>>>>> (eg. transfer data from ‚low‘ to ‚high‘). There are at least two
>>>> kinds of
>>>>>>> diodes. Some hardware-based ones only use one fiber optic to send
>>>> data (UDP
>>>>>>> based). Others use TCP, but prevent sending in the reverse
>> direction.
>>>>>>> 
>>>>>>> Nifi is an amazing tool that allows data to be transferred
>> between two
>>>>>>> separate networks in a very flexible but also secure way. I have
>>>>>>> implemented two processors. The first one ‚merges‘ the attributes
>> and
>>>> the
>>>>>>> content of a flowfile and sends it to the destination. The second
>> one
>>>>>>> listens on a TCP port, splits attributes and content and creates
>> a new
>>>>>>> flowfile containing all attributes of the origin flow. You can
>> send
>>>> the
>>>>>>> flow without attributes as well. In this case you can easily
>> netcat a
>>>>>>> binary file to Nifi.
>>>>>>> 
>>>>>>> These two processors are useful if you do NOT have a bidirectional
>>>>>>> communication between two NiFi instances and therefore the
>> site-2-site
>>>>>>> mechanism or http(s) cannot be used.
>>>>>>> 
>>>>>>> We have been using these processors for a longer period of time
>>>> (exactly
>>>>>>> the version for 1.13.2) and would like to share these processors
>> with
>>>>>>> others. So the question to you all is: Is someone interested in
>> these
>>>>>>> processors or is this use case too special?
>>>>>>> 
>>>>>>> The current source code can be found on GitHub. (
>>>>>>> https://github.com/nerdfunk-net/diode/ <
>>>>>>> https://github.com/nerdfunk-net/diode/>)
>>>>>>> 
>>>>>>> I have also implemented a UDP based version of the processor. Due
>> to
>>>> the
>>>>>>> nature of UDP, this is more complex and these processors are now
>> being
>>>>>>> tested.
>>>>>>> 
>>>>>>> Best regards
>>>>>>> Marc
>>>>> 
>>>> 
>> 

Reply via email to