Hi Mark, 

My team and I are working on a similar scenario as Anup but we're using SFTP 
not HDFS remote file source. 
I'm wondering if there will also be processors like ListSFTP and FetchSFTP in 
the 0.1.0 release 
that can keep state about what have been already pulled? We are thinking of 
implementing a custom processor 
just to do that. 

Thanks! 
Owie 

----- Original Message -----

From: "Corey Flowers" <[email protected]> 
To: [email protected] 
Sent: Wednesday, May 6, 2015 12:05:48 AM 
Subject: Re: Fetch change list 

Wahoo! Thanks Mark for saving me on this one! 

Anup, before this release, it would not have been pretty to pull that delta 
off! :-) 

On Tue, May 5, 2015 at 11:39 AM, Mark Payne <[email protected]> wrote: 



Anup, 
With the 0.1.0 release that we are working on right now, there are two new 
processors: ListHDFS, FetchHDFS, that are able to keep state about what has 
been pulled from HDFS. This way you can keep the data in HDFS and still 
only pull in new data. Will this help? 
Thanks-Mark 

> From: [email protected] 
> To: [email protected] 
> Subject: RE: Fetch change list 
> Date: Tue, 5 May 2015 15:32:07 +0000 
> 
> Thanks Corey for that info. But the major problem I'm facing is I am 
backing up a large set of data into HDFS (with a GetHDFS , source retained 
as true) and then trying to fetch the delta from it. (get only the files 
which have arrived recently by using the min Age and max Age). But I'm 
unable to get the exact delta if I have 'keep source file' as true.. 
> I played around a lot with schedule time and min & max age but didn't 
help. 
> 
> -----Original Message----- 
> From: Corey Flowers [mailto:[email protected]] 
> Sent: Tuesday, May 05, 2015 5:35 PM 
> To: [email protected] 
> Subject: Re: Fetch change list 
> 
> Ok, the get file that is running, is basically causing a race condition 
between all of the servers in your cluster. That is why you are seeing the 
"NoSuchFile" error. If you change the scheduling strategy on that processor 
to "On Primary node" Then the only system that will try to pick up data 
from that mount point, is the server you have designated "primary node". 
> This should fix that issue. 
> 
> On Mon, May 4, 2015 at 11:30 PM, Sethuram, Anup < 
[email protected]> 
> wrote: 
> 
> > Yes Corey, Right now the pickup directory is from a network share 
> > mount point. The data is picked up from one location and transferred 
> > to the other. I'm using site-to-site communication. 
> > 
> > -----Original Message----- 
> > From: Corey Flowers [mailto:[email protected]] 
> > Sent: Monday, May 04, 2015 7:57 PM 
> > To: [email protected] 
> > Subject: Re: Fetch change list 
> > 
> > Good morning Anup! 
> > 
> > Is the pickup directory coming from a network share mount 
point? 
> > 
> > On Mon, May 4, 2015 at 10:11 AM, Sethuram, Anup 
> > <[email protected] 
> > > 
> > wrote: 
> > 
> > > Hi , 
> > > I'm trying to fetch a set of files which have 
> > > recently changed in a "filesystem". Also I'm supposed to keep the 
> > > original copy as it is. 
> > > For obtaining the latest files that have changed, I'm using a 
> > > PutFile with "replace" strategy piped to a GetFile with a minimum 
> > > age of 5 sec, max file age of 30 sec, Keep source file as true, 
> > > 
> > > Also, running it in clustered mode. I'm seeing the below issues 
> > > 
> > > - The queue starts growing if there's an error. 
> > > 
> > > - Continuous errors with 'NoSuchFileException' 
> > > 
> > > - Penalizing StandardFlowFileErrors 
> > > 
> > > 
> > > 
> > > 
> > > ERROR 
> > > 
> > > 0ab3b920-1f05-4f24-b861-4fded3d5d826 
> > > 
> > > 161.91.234.248:7087 
> > > 
> > > GetFile[id=0ab3b920-1f05-4f24-b861-4fded3d5d826] Failed to retrieve 
> > > files due to 
> > > org.apache.nifi.processor.exception.FlowFileAccessException: Failed 
> > > to import data from /nifi/UNZ/log201403230000.log for 
> > > StandardFlowFileRecord[uuid=f29bda59-8611-427c-b4d7-c921ee5e74b8,cla 
> > > im =,offset=0,name=6908587554457536,size=0] 
> > > due to java.nio.file.NoSuchFileException: 
> > > /nifi/UNZ/log201403230000.log 
> > > 
> > > 18:45:56 IST 
> > > 
> > > 
> > > 
> > > 10:54:50 IST 
> > > 
> > > ERROR 
> > > 
> > > c552b5bc-f627-3cc3-b3d0-545c519eafd9 
> > > 
> > > 161.91.234.248:6087 
> > > 
> > > PutFile[id=c552b5bc-f627-3cc3-b3d0-545c519eafd9] Penalizing 
> > > StandardFlowFileRecord[uuid=876e51f7-9a3d-4bf9-9d11-9073a5c950ad,cla 
> > > im =1430717088883-73580,offset=0,name=file1.log,size=29314779] 
> > > and transferring to failure due to 
> > > org.apache.nifi.processor.exception.ProcessException: Could not 
> > > rename 
> > > /nifi/UNZ/.file1.log: 
> > org.apache.nifi.processor.exception.ProcessException: 
> > > Could not rename: /nifi/UNZ/.file1.log 
> > > 
> > > 10:54:56 IST 
> > > 
> > > ERROR 
> > > 
> > > 60662bb3-490a-3b47-9371-e11c12cdfa1a 
> > > 
> > > 161.91.234.248:7087 
> > > 
> > > PutFile[id=60662bb3-490a-3b47-9371-e11c12cdfa1a] Penalizing 
> > > StandardFlowFileRecord[uuid=522a2401-8269-4f0f-aff5-152d25cdcefa,cla 
> > > im =1430717094668-73059,offset=1533296,name=file2.log,size=28014262] 
> > > and transferring to failure due to 
> > > org.apache.nifi.processor.exception.ProcessException: Could not 
rename: 
> > > /data/softwares/RS/nifi/OUT/.file2.log: 
> > > org.apache.nifi.processor.exception.ProcessException: Could not 
rename: 
> > > /nifi/OUT/.file2.log 
> > > 
> > > 
> > > 
> > > Do I have to tweak the Run schedule or keep the same minimum file 
> > > age and maximum file age to overcome this issue? 
> > > What might be an elegant solution in NiFi? 
> > > 
> > > 
> > > Thanks, 
> > > anup 
> > > 
> > > ________________________________ 
> > > The information contained in this message may be confidential and 
> > > legally protected under applicable law. The message is intended 
> > > solely for the addressee(s). If you are not the intended recipient, 
> > > you are hereby notified that any use, forwarding, dissemination, or 
> > > reproduction of this message is strictly prohibited and may be 
> > > unlawful. If you are not the intended recipient, please contact the 
> > > sender by return e-mail and destroy all copies of the original 
message. 
> > > 
> > 
> > 
> > 
> > -- 
> > Corey Flowers 
> > Vice President, Onyx Point, Inc 
> > (410) 541-6699 
> > [email protected] 
> > 
> > -- This account not approved for unencrypted proprietary information 
> > -- 
> > 
> > ________________________________ 
> > The information contained in this message may be confidential and 
> > legally protected under applicable law. The message is intended solely 
> > for the addressee(s). If you are not the intended recipient, you are 
> > hereby notified that any use, forwarding, dissemination, or 
> > reproduction of this message is strictly prohibited and may be 
> > unlawful. If you are not the intended recipient, please contact the 
> > sender by return e-mail and destroy all copies of the original message. 
> > 
> 
> 
> 
> -- 
> Corey Flowers 
> Vice President, Onyx Point, Inc 
> (410) 541-6699 
> [email protected] 
> 
> -- This account not approved for unencrypted proprietary information -- 
> 
> ________________________________ 
> The information contained in this message may be confidential and 
legally protected under applicable law. The message is intended solely for 
the addressee(s). If you are not the intended recipient, you are hereby 
notified that any use, forwarding, dissemination, or reproduction of this 
message is strictly prohibited and may be unlawful. If you are not the 
intended recipient, please contact the sender by return e-mail and destroy 
all copies of the original message. 





-- 



Corey Flowers 
Vice President, Onyx Point, Inc 
(410) 541-6699 
[email protected] 

-- This account not approved for unencrypted proprietary information -- 

Reply via email to