Ok one more question...

On GetHDFS are you setting the Directory to
"\landing\databasename\prodeiw_arc\"
and then setting Recurse Sub-Directories to true to have it go into each
table's directory?

The reason I ask is because the FlowFiles coming out of GetHDFS have an
attribute on them called Path, the documentation says:

The path is set to the relative path of the file's directory on HDFS. For
example, if the Directory property is set to /tmp, then files picked up
from /tmp will have the path attribute set to "./". If the Recurse
Subdirectories property is set to true and a file is picked up from
/tmp/abc/1/2/3, then the path attribute will be set to "abc/1/2/3"

So theoretically if you were pointing to "\landing\databasename\prodeiw_arc\"
and then it recursed into "\landing\databasename\prodeiw_arc\tablename",
the path attribute would end up being "tablename".

You could then reference this in your PutHDFS processor by setting the
Directory to "/landing/teradata/compressed/prodeiw_arc/$[path}"



On Wed, Apr 6, 2016 at 8:46 AM, jamesgreen <[email protected]>
wrote:

> Hi Brian, Thanks for the help!
>
> I have tried two ways
> a.
> 1.      I use GetHDFS to retrieve data from the HDFS , I then use putHDFS
> and set
> the compression to GZIP.
> 2.      In the Directory I am putting the complete path i.e
> /landing/teradata/compressed/prodeiw_arc
> b.
> 1.       I use GetHDFS to retrieve data from the HDFS, I then use Compress
> Content to apply the compression and then use PutHDFS
> 2.      In the Directory I am putting the complete path i.e
> /landing/teradata/compressed/prodeiw_arc
>
>
>
>
> --
> View this message in context:
> http://apache-nifi-developer-list.39713.n7.nabble.com/Compression-of-Data-in-HDFS-tp8821p8825.html
> Sent from the Apache NiFi Developer List mailing list archive at
> Nabble.com.
>

Reply via email to