Hi, If those logs are being printed to your console, I believe you are running the job locally as opposed to the cluster. I would ensure you have correct Hadoop Configs in your classpath.
brock On Wed, Oct 1, 2014 at 2:54 AM, Sean Violante <[email protected]> wrote: > I am just starting to use parquet, and I have hit a big problem. > > CREATE TABLE if not exists demand_parquet > ( account int, site int, carrier int , cluster string, os string, token > string, idfa string, dpid_sha1 string, bid_requests int) > partitioned by (date string) > clustered by (dpid_sha1) into 256 buckets > stored as parquet; > > I insert with this statement > > FROM demand_carrier > INSERT OVERWRITE TABLE sfr_demand_carrier_parquet > PARTITION (date) > SELECT date, account, site , carrier , cluster , os , token , idfa , > dpid_sha1 , bid_requests; > > everything seems to work fine ( my hadoop job completes) > > But on the hive console output I see a neverending log (over 12 hours and > counting). If I stop hive (during this logging phase) then the parquet file > is destroyed. > > > I assume I must be doing something fundamentally wrong... > Presumably no one else has this excess logging issue, yet it seems like > there is no way of turning the logging off without recompiling parquet.... > What am I missing? > > > > > Sep 30, 2014 3:39:40 PM INFO: parquet.hadoop.codec.CodecConfig: Compression > set to false > Sep 30, 2014 3:39:40 PM INFO: parquet.hadoop.codec.CodecConfig: > Compression: UNCOMPRESSED > Sep 30, 2014 3:39:40 PM INFO: parquet.hadoop.ParquetOutputFormat: Parquet > block size to 134217728 > Sep 30, 2014 3:39:40 PM INFO: parquet.hadoop.ParquetOutputFormat: Parquet > page size to 1048576 > Sep 30, 2014 3:39:40 PM INFO: parquet.hadoop.ParquetOutputFormat: Parquet > dictionary page size to 1048576 > Sep 30, 2014 3:39:40 PM INFO: parquet.hadoop.ParquetOutputFormat: > Dictionary is on > Sep 30, 2014 3:39:40 PM INFO: parquet.hadoop.ParquetOutputFormat: > Validation is off > Sep 30, 2014 3:39:40 PM INFO: parquet.hadoop.ParquetOutputFormat: Writer > version is: PARQUET_1_0 > Sep 30, 2014 3:39:40 PM INFO: parquet.hadoop.InternalParquetRecordWriter: > Flushing mem store to file. allocated memory: 47,605,338 > Sep 30, 2014 3:39:40 PM INFO: parquet.hadoop.ColumnChunkPageWriteStore: > written 0B for [account] INT32: 0 values, 0B raw, 0B comp, 0 pages, > encodings: [] > Sep 30, 2014 3:39:40 PM INFO: parquet.hadoop.ColumnChunkPageWriteStore: > written 0B for [site] INT32: 0 values, 0B raw, 0B comp, 0 pages, encodings: > [] > Sep 30, 2014 3:39:40 PM INFO: parquet.hadoop.ColumnChunkPageWriteStore: > written 0B for [carrier] INT32: 0 values, 0B raw, 0B comp, 0 pages, > encodings: [] > Sep 30, 2014 3:39:40 PM INFO: parquet.hadoop.ColumnChunkPageWriteStore: > written 0B for [cluster] BINARY: 0 values, 0B raw, 0B comp, 0 pages, > encodings: [] > Sep 30, 2014 3:39:40 PM INFO: parquet.hadoop.ColumnChunkPageWriteStore: > written 0B for [os] BINARY: 0 values, 0B raw, 0B comp, 0 pages, encodings: > [] > Sep 30, 2014 3:39:40 PM INFO: parquet.hadoop.ColumnChunkPageWriteStore: > written 0B for [token] BINARY: 0 values, 0B raw, 0B comp, 0 pages, > encodings: [] > Sep 30, 2014 3:39:40 PM INFO: parquet.hadoop.ColumnChunkPageWriteStore: > written 0B for [idfa] BINARY: 0 values, 0B raw, 0B comp, 0 pages, > encodings: [] > > ....... > Oct 1, 2014 11:42:59 AM INFO: parquet.hadoop.InternalParquetRecordWriter: > Flushing mem store to file. allocated memory: 47,605,338 > Oct 1, 2014 11:42:59 AM INFO: parquet.hadoop.ColumnChunkPage > > WriteStore: written 0B for [account] INT32: 0 values, 0B raw, 0B comp, 0 > pages, encodings: [] > Oct 1, 2014 11:42:59 AM INFO: parquet.hadoop.ColumnChunkPageWriteStore: > written 0B for [site] INT32: 0 values, 0B raw, 0B comp, 0 pages, encodings: > [] > Oct 1, 2014 11:42:59 AM INFO: parquet.hadoop.ColumnChunkPageWriteStore: > written 0B for [carrier] INT32: 0 values, 0B raw, 0B comp, 0 pages, > encodings: [] > Oct 1, 2014 11:42:59 AM INFO: parquet.hadoop.ColumnChunkPageWriteStore: > written 0B for [cluster] BINARY: 0 values, 0B raw, 0B comp, 0 pages, > encodings: [] > Oct 1, 2014 11:42:59 AM INFO: parquet.hadoop.ColumnChunkPageWriteStore: > written 0B for [os] BINARY: 0 values, 0B raw, 0B comp, 0 pages, encodings: > [] > Oct 1, 2014 11:42:59 AM INFO: parquet.hadoop.ColumnChunkPageWriteStore: > written 0B for [token] BINARY: 0 values, 0B raw, 0B comp, 0 pages, > encodings: [] > > > in the individual reduce logs I see > > 44:39 PM INFO: parquet.hadoop.ColumnChunkPageWriteStore: written 23B > for [account] INT32: 1 values, 6B raw, 6B comp, 1 pages, encodings: > [BIT_PACKED, PLAIN, RLE] > Sep 30, 2014 12:44:39 PM INFO: > parquet.hadoop.ColumnChunkPageWriteStore: written 27B for [site] > INT32: 1 values, 10B raw, 10B comp, 1 pages, encodings: [BIT_PACKED, > PLAIN, RLE] > Sep 30, 2014 12:44:39 PM INFO: > parquet.hadoop.ColumnChunkPageWriteStore: written 27B for [carrier] > INT32: 1 values, 10B raw, 10B comp, 1 pages, encodings: [BIT_PACKED, > PLAIN, RLE] > Sep 30, 2014 12:44:39 PM INFO: > parquet.hadoop.ColumnChunkPageWriteStore: written 30B for [cluster] > BINARY: 1 values, 13B raw, 13B comp, 1 pages, encodings: [BIT_PACKED, > PLAIN, RLE] > Sep 30, 2014 12:44:39 PM INFO: > parquet.hadoop.ColumnChunkPageWriteStore: written 30B for [os] BINARY: > 1 values, 13B raw, 13B comp, 1 pages, encodings: [BIT_PACKED, PLAIN, > RLE] > Sep 30, 2014 12:44:39 PM INFO: > parquet.hadoop.ColumnChunkPageWriteStore: written 34B for [token] > BINARY: 1 values, 17B raw, 17B comp, 1 pages, encodings: [BIT_PACKED, > PLAIN, RLE] > Sep 30, 2014 12:44:39 PM INFO: > parquet.hadoop.ColumnChunkPageWriteStore: written 38B for [idfa] > BINARY: 1 values, 21B raw, 21B comp, 1 pages, encodings: [BIT_PACKED, > PLAIN, RLE] > Sep 30, 2014 12:44:39 PM INFO: > parquet.hadoop.ColumnChunkPageWriteStore: written 27B for [dpid_sha1] > BINARY: 1 values, 10B raw, 10B comp, 1 pages, encodings: [BIT_PACKED, > PLAIN, RLE] > Sep 30, 2014 12:44:39 PM INFO: > parquet.hadoop.ColumnChunkPageWriteStore: written 23B for > [bid_requests] INT32: 1 values, 6B raw, 6B comp, 1 pages, encodings: > [BIT_PACKED, PLAIN, RLE] > Sep 30, 2014 12:44:40 PM INFO: parquet.hadoop.codec.CodecConfig: > Compression set to false > Sep 30, 2014 12:44:40 PM INFO: parquet.hadoop.codec.CodecConfig: > Compression: UNCOMPRESSED > Sep 30, 2014 12:44:40 PM INFO: parquet.hadoop.ParquetOutputFormat: > Parquet block size to 134217728 > Sep 30, 2014 12:44:40 PM INFO: parquet.hadoop.ParquetOutputFormat: > Parquet page size to 1048576 > Sep 30, 2014 12:44:40 PM INFO: parquet.hadoop.ParquetOutputFormat: > Parquet dictionary page size to 1048576 > Sep 30, 2014 12:44:40 PM INFO: parquet.hadoop.ParquetOutputFormat: > Dictionary is on > Sep 30, 2014 12:44:40 PM INFO: parquet.hadoop.ParquetOutputFormat: > Validation is off > Sep 30, 2014 12:44:40 PM INFO: parquet.hadoop.ParquetOutputFormat: > Writer version is: PARQUET_1_0
