On Tue, Jul 16, 2013 at 2:45 AM, Subroto Sanyal <sanyalsubr...@gmail.com>wrote:
> Thanks Alan, > > Just an another thought. > How about using a different InputFormat like: STORED as INPUTFORMAT > com.myproject.MyOwnInputFormat ? > Which is the best approach and why? > Hive and HCat divide the file format into to parts: serde - translates each row to a sequence of bytes file format - takes a set of rows (as bytes from the serde) and writes them to disk For text formats, the serde controls how the row is serialized and the text file format puts newlines at the end of each row. So the question is whether you are trying to control the serialization, the file container, or both. Note that newer file formats like ORC and Parquet combine the serde and format because the serialization is integrated with the file format. -- Owen > Downline I would like to read the table from PIG as well. > > > On Mon, Jul 15, 2013 at 7:12 PM, Alan Gates <ga...@hortonworks.com> wrote: > >> All you need to do is write a Hive SerDe. There is some documentation at >> https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide. Also >> you can use existing SerDes in Hive as an example. >> >> Alan. >> >> On Jul 5, 2013, at 8:06 AM, Subroto Sanyal wrote: >> >> > Hi, >> > >> > Newbie question... >> > I have my own file format. The files are saved on HDFS. I would like >> HCatalog to facilitate to read those files by Hive. >> > Something like: >> > >> > Hive >> > | >> > HCatalog >> > | >> > MyFiles >> > >> > Where should I start with? >> > >> > Is there any sample integration of other File formats which I can use a >> reference? >> > >> > >> > -- >> > Cheers, >> > Subroto Sanyal >> >> > > > -- > Cheers, > *Subroto Sanyal* >