Thanks a lot for the quick reply Ashish. The files are currently across multiple folders as they high in number and so they are arranged by category (functionally) across multiple folders in HDFS. Any work around to support multiple folders?
-KK. ----- Original Message ---- From: Ashish Thusoo <[email protected]> To: "[email protected]" <[email protected]> Sent: Wed, May 26, 2010 11:03:43 AM Subject: RE: Query HDFS files without using LOAD (move) You could probably use external tables?? CREATE EXTERNAL TABLE allows you to create tables on hdfs files but I do not think that it takes file patterns / regex. If all the files are created within a directory then you could point the external table to the directory location and then querying on that table would automatically query all the files in that directory. Are your files in a single directory or are they spread out? http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL#Create_Table Ashish -----Original Message----- From: Karthik [mailto:[email protected]] Sent: Wednesday, May 26, 2010 10:45 AM To: [email protected] Subject: Query HDFS files without using LOAD (move) Is there a way where I can specify a list of files (or file pattern / regex) from a HDFS location other than the Hive Warehouse as a parameter to a Hive Query? I have a bunch of files that are used by other applications as well and I need to perform queries on those as well using Hive and so I do not want to use LOAD and move those files on to Hive warehouse from the original location. My query is on incremental data (new files) that are added on a daily basis and need not use the full list of files on a folder and so I need to specify a list of file / pattern, something like a filter of files to the query. Please suggest. - KK.
