JoeS Sounds great. I'd ignore my provenance comment as that was really more about how something external could keep tabs on progress, etc.. Mark Payne designed/built the List/Fetch HDFS one so I'll defer to him for the good bits. But the logic to follow for saving state you'll want is probably the same.
Mark - do you have the design of that thing documented anywhere? It is a good pattern to describe because it is effectively a model for taking non-scaleable dataflow interfaces and making them behave as if they were. Thanks JoeW On Wed, Jul 29, 2015 at 6:07 AM, Joe Skora <[email protected]> wrote: > Joe, > > I'm interested in working on List/FetchFile. It seems like starting with > [NIFI-631|https://issues.apache.org/jira/browse/NIFI-631] makes sense. > I'll look at List/FetchHDFS, but is there any further detail on how this > functionality should differ from GetFile? As for keeping state, > provenance was suggested, a separate state folder might work, or some file > systems support additional state that might be usable. > > Regards, > Joe > > On Tue, Jul 28, 2015 at 12:42 AM, Joe Witt <[email protected]> wrote: > >> Anup, >> >> The two tickets in question appear to be: >> https://issues.apache.org/jira/browse/NIFI-631 >> https://issues.apache.org/jira/browse/NIFI-673 >> >> Neither have been claimed as of yet. Anybody interested in taking one >> or both of these on? It would be a lot like List/Fetch HDFS so you'll >> have good examples to work from. >> >> Thanks >> Joe >> >> On Tue, Jul 28, 2015 at 12:37 AM, Sethuram, Anup >> <[email protected]> wrote: >> > Can I expect this functionality in the upcoming releases of Nifi ? >> > >> > On 13/07/15 9:13 am, "Sethuram, Anup" <[email protected]> wrote: >> > >> >>Where is this 1TB dataset living today? >> >>[anup] Resides in a filesystem >> >> >> >>- What is the current nature of the dataset? Is it already in large >> >>bundles as files or is it a series of tiny messages, etc..? Does it >> >>need to be split/merged/etc.. >> >>[anup] Archived files of size 3MB each collected over a period. Directory >> >>(1TB) -> Sub-Directories -> Files >> >> >> >>- What is the format of the data? Is it something that can easily be >> >>split/merged or will it require special processes to do so? >> >>[anup] zip, tar formats. >> >> >> >> >> >> >> >>-- >> >>View this message in context: >> >> >> http://apache-nifi-incubating-developer-list.39713.n7.nabble.com/Fetch-cha >> >>nge-list-tp1351p2126.html >> >>Sent from the Apache NiFi (incubating) Developer List mailing list >> >>archive at Nabble.com. >> >> >> >>________________________________ >> >>The information contained in this message may be confidential and legally >> >>protected under applicable law. The message is intended solely for the >> >>addressee(s). If you are not the intended recipient, you are hereby >> >>notified that any use, forwarding, dissemination, or reproduction of this >> >>message is strictly prohibited and may be unlawful. If you are not the >> >>intended recipient, please contact the sender by return e-mail and >> >>destroy all copies of the original message. >> > >> > >> > ________________________________ >> > The information contained in this message may be confidential and >> legally protected under applicable law. The message is intended solely for >> the addressee(s). If you are not the intended recipient, you are hereby >> notified that any use, forwarding, dissemination, or reproduction of this >> message is strictly prohibited and may be unlawful. If you are not the >> intended recipient, please contact the sender by return e-mail and destroy >> all copies of the original message. >>
