Filtering out files in a bucket (update on HIVE-951)

Avram Aelony Mon, 24 Jan 2011 12:11:56 -0800

Hi,

I really like the virtual column feature in 0.7 that allows me to request 
INPUT__FILE__NAME and see the names of files that are being acted on.


Because I can see the files that are being read, I see that I am spending time 
querying many, many very large files, most of which I do not need to process 
because these extra files are in the same s3 bucket location that contains the 
files I need.  

The files I do need to process only a represent a subset of all files in the 
bucket. Nevertheless, the files I am interested in are quite large, and large 
enough to make copying to hdfs unwieldy. 

Since I know the files I want to process by name before the scan of all files, 
can I be more efficient and only process a selection of files from a bucket 
avoiding those I don't?

  
I guess I am still looking for something like 
https://issues.apache.org/jira/browse/HIVE-951 


Any suggestions or update on HIVE-951 ?


Thanks,
Avram

Filtering out files in a bucket (update on HIVE-951)

Reply via email to