Dear hive-user's, I've written my own custom SerDe to handle some log files in a custom format and as I'd quite like to (eventually) use the JDBC driver down the line, I'd quite like to retain the column types for the output. Part of the reason for this is that we're using OpenCSV (http://opencsv.sourceforge.net/) to produce them in the first place, so it'd be good to use it again to parse the files when used for querying in Hive.
I've implemented my own SerDe, originally using MetadataTypedColumnsetSerDe as a basis, however whenever I run a query, no data is returned, regardless of the amount of data I load into the table. The load proceeds fine. I am using the version of Hive from Cloudera's CDH3 distribution (based on 0.5.0). My create table statement is: CREATE TABLE my_test_table (col_name_1 STRING, col_name_2, INT, ... etc) COMMENT 'Some comment' PARTITIONED BY (part_col_1 STRING, part_col_2 STRING) ROW FORMAT SERDE "com.my.package.named.MyNewSerDe" STORED AS TEXTFILE; I have switched on the debug logging and put a bunch of debug statements in my code and I've found that when I do a simple query (like "select * from my_test_table limit 10;") so that it runs locally, it does find the class. Indeed it calls the initialize method and calls the getObjectInspector method a number of times. Subsequently though, it calls initialize on LazySimpleSerDe three times. The first two times it has dummy column names (_col0) and the correct column types in the correct order. The last time it contains no column names or types at all. Presumably I'm missing something fairly simple from somewhere (a class extension missing, wrong class returned by getSerializedClass() or perhaps constructing the ObjectInspector incorrectly?) but for the life of me I can't spot it. The underlying files are just CSV's constructed using the OpenCSV library above. I'd be very grateful for any suggestions. Thanks, Jamie