[
https://issues.apache.org/jira/browse/MINIFI-244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15936609#comment-15936609
]
Andrew Christianson commented on MINIFI-244:
--------------------------------------------
[~joewitt], your inquiry is totally on-base. This is a tricky one. I agonized
over this quite a bit, and dug into the semantics of Unpack/Merge from NiFi as
well as the latest notify/wait processors. In general, my aim for any MiNiFi
work is to remain as consistent with NiFi as possible. That said, if we have a
case for doing something here, then it would be worthwhile to port to NiFi as
well and do it differently there, for good purpose.
As for the purpose, it really boils down to a simple use-case: you take a tar
(or some other sufficiently "complex" object) into a flow, and you want to
manipulate one component of the complex object without having to break apart
the object or perform other contortions (splitting, re-merging, etc.). If we
can perform the transformation atomically on one data item within one
processor, it solves a lot of complexity. I.e. no merge correlations,
error-handling complexity, or complex notify/wait configurations.
For a completely concrete example, just consider performing an XSLT on one
entry in a tar.
The comparison is:
Option A (conventional): run the FlowFile through Unpack, resulting in multiple
flow files, route based on filename, run the target XML entry through
TransformXML, then route everything back to Merge. Use the defrag strategy to
re-create the original structure, while making sure to have the component
attributes exactly right). Will it all arrive on time? What happens if one
file, only incidentally extracted (in order to be able to re-create the tar),
fails? We can set a timeout on the merge, but what if processing is just slow
that day? Lots of questions arise. Does it work? Yes. A colleague familiar with
NiFi told me this works "90%" of the time.
Option B (with lens): run the FlowFile through a lens ("focusing" on one part
of an overall complex object) perform a transformation with a completely
standard/unmodified processor (TransformXML), then send it back through a lens
to get it back to its original state ("unfocus" the entry, "focus" the archive).
Ultimately, the aim is to reduce the need for a complex (and error-prone) flow,
and avoid the temptation/need to write a custom processor every time there is a
need to manipulate one part of a complex structure, while preserving structure
(and doing it all atomically if possible).
As for the language and semantics, I agree with you in principle. It should all
be as user-centric as possible, and the language of the flow should be simple
even if the inner-workings are necessarily complex. So, I'm ears on how we
could label this thing. I don't think that the "extract" verb fits, and I think
the use-cases above should show why. Some other ideas (and I'm all ears for
other suggestions):
- FocusArchive (a little too general as we are technically focusing the entry,
not the archive)
- FocusArchiveEntry/UnfocusArchiveEntry (feels clunky?)
- ApplyArchiveLens (kind of like this one. just the right amount of generality
and flexibility)
- ExposeArchiveElement/UnexposeArchiveElement
- RotateArchive (symmetrical; uses a geometric vs. optical analogy)
- InvertArchive (symmetrical)
I can't make it fit into any sort of extract/unpack verb, because that's just
not what it is. The basic purpose of the processor is to expose a part of a
greater whole, while explicitly not extracting or unpacking it. I think we
probably need a new (to NiFi) verb.
The other thing is, this is all rooted in theory (category theory). The theory
can be dense, but the blog entry linked in the description is focused more on
the practical aspect. I believe that the concepts of lenses (and monads) can
really help simplify some recurring dataflow design problems, even if we don't
necessarily expose the language to the user. Even though the theory is complex,
I think there is great value even for the typical NiFi user. This kind of
transformation task comes up all the time, and I want to make the language and
semantics as easy to understand as possible, while leveraging the benefits of
the advanced theory.
> Create ArchiveLens processor
> ----------------------------
>
> Key: MINIFI-244
> URL: https://issues.apache.org/jira/browse/MINIFI-244
> Project: Apache NiFi MiNiFi
> Issue Type: Task
> Components: C++, Extensions
> Reporter: Andrew Christianson
> Assignee: Andrew Christianson
> Priority: Minor
>
> Create an ArchiveLens processor. A concise, though informal, definition of a
> lens is as follows:
> "Essentially, they represent the act of “peering into” or “focusing in on”
> some particular piece/path of a complex data object such that you can more
> precisely target particular operations without losing the context or
> structure of the overall data you’re working with."
> https://medium.com/@dtipson/functional-lenses-d1aba9e52254#.hdgsvbraq
> Why an ArchiveLens in MiNiFi? Simply put, it will enable us to "focus in on"
> an entry in the archive, perform processing *in-context* of that entry, then
> re-focus on the overall archive. This allows for transformation or other
> processing of an entry in the archive without losing the overall context of
> the archive.
> Initial format support is tar, due to its simplicity and ubiquity.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)