Ok one more question... On GetHDFS are you setting the Directory to "\landing\databasename\prodeiw_arc\" and then setting Recurse Sub-Directories to true to have it go into each table's directory?
The reason I ask is because the FlowFiles coming out of GetHDFS have an attribute on them called Path, the documentation says: The path is set to the relative path of the file's directory on HDFS. For example, if the Directory property is set to /tmp, then files picked up from /tmp will have the path attribute set to "./". If the Recurse Subdirectories property is set to true and a file is picked up from /tmp/abc/1/2/3, then the path attribute will be set to "abc/1/2/3" So theoretically if you were pointing to "\landing\databasename\prodeiw_arc\" and then it recursed into "\landing\databasename\prodeiw_arc\tablename", the path attribute would end up being "tablename". You could then reference this in your PutHDFS processor by setting the Directory to "/landing/teradata/compressed/prodeiw_arc/$[path}" On Wed, Apr 6, 2016 at 8:46 AM, jamesgreen <[email protected]> wrote: > Hi Brian, Thanks for the help! > > I have tried two ways > a. > 1. I use GetHDFS to retrieve data from the HDFS , I then use putHDFS > and set > the compression to GZIP. > 2. In the Directory I am putting the complete path i.e > /landing/teradata/compressed/prodeiw_arc > b. > 1. I use GetHDFS to retrieve data from the HDFS, I then use Compress > Content to apply the compression and then use PutHDFS > 2. In the Directory I am putting the complete path i.e > /landing/teradata/compressed/prodeiw_arc > > > > > -- > View this message in context: > http://apache-nifi-developer-list.39713.n7.nabble.com/Compression-of-Data-in-HDFS-tp8821p8825.html > Sent from the Apache NiFi Developer List mailing list archive at > Nabble.com. >
