This is an automated email from the ASF dual-hosted git repository. blue pushed a change to branch parquet-1.10.x in repository https://gitbox.apache.org/repos/asf/parquet-mr.git.
at 8ad44a9 [maven-release-plugin] prepare release apache-parquet-1.10.1 This branch includes the following new commits: new 576c709 Initial commit new a8c10ef initial commit new f3cdad3 README.md new 2402879 Update README.md new dbbf944 updated name new 7ed9528 Merge branch 'master' of github.com:julienledem/redelm new 5fb63e9 refactoring tuples to use indices instead of field names new 741c6fd adding push parser test new a1a04d5 move closing records at the begining of the loop new 9b68b39 make field started and ended only once when value is repeated, refactor out SimpleGroupRecordConsumer new 34124ed remove depedency on Group from RecordReader new 968fd55 remove dependency on group from the io implementation new c4109e2 fix compilation and dependency issue new f689f90 fix compilation issue new 82e0324 pig Tuple consumer and writer new 5e95561 temporarily removing thrift stuff new f2ea25f turning off logs new d1ceb26 Update README.md new da7ec48 adding license header new 0b73962 Merge branch 'master' of github.com:julienledem/redelm new 7b98856 adding travis ci conf new e6ecdc6 fixing source version new cf5ba98 fix source encoding new ab7df25 added build info new 2302fe4 first version of the Loader/Storer new 575c221 PrintFooter tool new bdd3635 Add some comments and tests new 40d512b work new 591da70 adding support for Float and Double new 0025768 fixed tests new 246efae refactor of the columns new 527beef adding license headers new 11b8190 Merge pull request #1 from julienledem/hack_week new 1e7152b work new 9da016d merged new 71b7b11 work new 02b40db work new 38434c6 work new f84fa27 work new 30e3849 work new 194cfdb work new e12b27f fix support for int/long and map new d080698 adding schema validation new da419bf fix bug regarding null values in message new 9f9b23d work new 9c34678 work new 525d3ae work new 4cd5bc1 work new 063486d work new ba06716 remove a use of StringBuffer new 46a3458 first pass at adding compression new 4bfca1f make Codec configurable new 1a0284d fix empty string bug new 94781b0 VarInt for String length new 1d40ac2 add javadoc new c2ed3ee work new 3b447a6 Merge pull request #8 from julienledem/add_column_compression new 32bb525 work new 94c10a2 Merge branch 'hack_week_jco' of https://github.com/jcoveney/redelm new 33525b8 add JCO new fd5bd4a record uncompressed size in footer; add detailed report of size and compression per column new 3b8ad08 cleanup logs new df97795 javadoc new e2e3eeb first stab a decoupling the Input/OutputFormat from Pig new 76ded4f store count of metadatablocks in footer new f9be51e Merge pull request #9 from julienledem/javadoc new f070939 Merge pull request #10 from julienledem/decoupling_InputOutputFormat_from_Pig new 130e213 better hadoop layer decoupling new 4fb9a48 moved hadoop implementation into its own hadoop package without dependencies on pig packages new fe6066b get the compression codec from Configuration properties new b507f7d make block size configurable new 5271080 Merge pull request #11 from julienledem/move_hadoop_stuff_into_hadoop_package new 1243fcc add status header new 06ab2ed implement empty bag != null bag new fbf8333 cleanup unnecessary variables new 8774451 first stab at pregenerated Pig consumer new 0094892 use non spillable databag for records new 27c0255 make Map work new ffe80dd add summary file new 88ed7fe Support for short int columns by JCoveney https://github.com/jcoveney/redelm rep_def_column add pig snapshot new af992f3 fixed doc based on jco's comments new 4a372bc Merge branch 'master' into summary_file new ad3b459 Merge branch 'master' into preprocess_pig_schema new 40b0cb3 Merge pull request #13 from julienledem/fix_empty_bag_equals_null_bag new 0c683e1 fix build warning new bdc5a6e fix compression javadoc new 60ca3a3 add documentation images new daea1fd updated format diagram new bce0f0d fix Codec Logging new 6f4650a Merge branch 'master' into preprocess_pig_schema new 48ef66b fix exception handling new c73105c fix exceptions in Converters new 4c0a1f7 Merge pull request #14 from julienledem/preprocess_pig_schema new 0549961 Merge pull request #16 from julienledem/fix_Codec_Logging new ea3d65e improve file format diagram new 9385a2f Merge branch 'master' of github.com:julienledem/redelm new ee02094 better logging and perf tests new 9223ad5 Merge branch 'master' into summary_file new 817b54b Merge pull request #17 from julienledem/summary_file new 615c23c make splits contain all data blocks starting in the same HDFS block new b23e1f6 add missing license headers new e186b7d fix UDFContext collision when multiple stores new 73d4fde first stab at record reader compiler new 0f88e29 simplified record reader; a little more of reader compiler new 104b219 remove currentNodePath from reader and improve perf a lot new 7133e58 Merge pull request #19 from julienledem/remove_currentNodePath_from_reader new c4a41dd more fsa codegen new 072db3e change record reader init lifecycle new 48f0e8f refactor record consumer materializer new b6f0ca6 remove unnecessary constructor param new eaf3171 Merge branch 'master' into improve_record_consumer_interface new 4e5da20 introduce state object new 27c93d0 more use of the state class new 8f2c7e5 slight improvement to the record reader new 7dd9300 remove dependency of column io on column store new e735459 remove unecessary class parameter new 5334a2d add missing license headers new 11b6a8f Merge branch 'improve_record_consumer_interface' into FSA_codegen new eaeff04 refactor reader new c4c6991 rewrite of the code generation bit (work in progress) new 180222e rename RecordConsumerWrapper new eba3e21 Merge branch 'master' of github.com:julienledem/redelm into better_InputFormat_logs new f3f02f5 Merge pull request #18 from julienledem/better_InputFormat_logs new 0c3a1d8 Merge branch 'master' of github.com:julienledem/redelm into improve_record_consumer_interface new 1954277 removed cheesy comment and hardcoded max depth based on jco's comment new fa4c4ba removed unnecessary whitespace new d68307e Merge branch 'master' into fix_schema_passing_when_multiple_stores new 448e8c8 Merge branch 'master' into FSA_codegen new faf756e fixed visibility new 758bb8b Merge pull request #21 from julienledem/fix_schema_passing_when_multiple_stores new d71d754 Make PigSchemaConverter Static new 19106d0 Fix merge conflict new 676c471 Make maps work with sigle column case new db75e0b add missing base implementations new 7aaf6f9 removed unecessary readOneRecord() method and commented out old code new 770cf4d more optimizations new e5dec2b optimizations;fix some bugs new b896112 trigger full gc before starting new 48d9105 Revert PigSchemaConverter to static new ba0f9ad Update gitignore new d202560 Incorporate Julien's comments new d18ec37 Merge pull request #23 from jcoveney/fix_simple_maps new 78cff39 Merge pull request #20 from julienledem/improve_record_consumer_interface new cd7117b some modification to understand better impact of gc new fd93ec2 Merge branch 'master' into FSA_codegen new d5ea045 cleanup unused import new da4b034 add license headers new 2a3d6af make BaseRecordReader its own class new 2883f89 merge both switch statements to optimize new b89750b some cleanup based on Jco's comments new 17271e7 initial integration with the new metadata new f25dfdc fix row count per rowgroup persistence new 8ec8a9d fix storer problem with page storage new d63ded4 remove string type; make schema in footer use object model; make ints little endian new cd9e8fe integrate the schema change from children to parent new 3043e75 optimize buffer copy; revert parent new 2e09b42 cleanup string type new be00886 better support for plain encoding new 6cdf6aa generator for the int_test_file new f3adb6e renamed to Plain to match the Encoding name new 781f40f PLAIN encoding comformance new 1bd47c3 generate TPCH customer new 07d7aa0 change children_indices for children_count in schema representation new 867e24b fix repetition for root new 9c7e5f5 add compression back; make page size configurable; rename children_count to num_children new 18d5082 fix decompression new 04b60ee rework compression new d8a18ce fix PrintFooter; fix string encoding; rewrite split generation; fix block reading logic new 2ea2954 split stores new 4b06058 move compression to decode time and fix compressor init overhead new 113f8b8 reworked converter framework new 372b9f3 fix projection using pig schema new b38de8a add allocated usage monitoring new a62af63 fix bit packing new 15f6beb expose schema to pig new 1823220 rename the metadata package new b598378 removed reference to red file new 90f027e change magic number to PAR1 new e6ca25c adding license header new 08df187 Merge pull request #25 from julienledem/integrate_format_changes new 951cd2f Merge branch 'master' into FSA_codegen new adcab26 Merge pull request #24 from julienledem/FSA_codegen new 33aad25 rename package to parquet new 73f6997 rename package to parquet new 24c85c7 renaming maven artifacts new ac3a806 renamed to parquet new bd1fde9 cleanup new 5164918 Merge pull request #26 from julienledem/rename_to_parquet new 4a0bb74 cleanup exceptions new 85d8f09 removed brennus dependency (for now) new 5141722 integrate thrift changes new cfdcdff turn off customer test new c5a894a split hadoop; add thrift new f6adf0b integrate new converter; cleanup new 78ff38c improve logs new 9058bae improve use of summary file new b2efb50 refactor the read/write support new bbf7932 move to official pig release new ad9dcbe javadoc; original type support new 42c418c javadoc new f2746e7 adding example output/input formats new 77097b6 ThriftParquetOutputFormat new 92e93b1 license header new c5ce15e license header new 6bf2661 Merge branch 'master' of github.com:julienledem/redelm into parquet_thrift new af20471 javadoc; bug fixes; thrift support; refactoring new 25b7559 thrift read protocol; fix repetition level size in little endian new 9c79b08 thrift input/output format support new ecf19dc add encoding information for the column reader; allow column writer to specify the encoding new e8f8429 populate encodings in column metadata new d35c264 turn byte[] into Binary object in the api new 932695c update dependency on elephant-bird new 07ea133 license headers new ee8ec11 javadoc; turn off the compatibility test for now new 54dd652 integrate the thrift changes new 0973e7d removed outdated comment new c3c22a0 Merge pull request #1 from Parquet/parquet_thrift new 003299e Style cleanup and other miscellanea (javadoc, etc) new 591c2b9 move compatibility test to the appropriate repo new 76fd1d8 cleanup new 962dd9e More cleanup/renames new 7ea3721 Merge remote-tracking branch 'origin/master' new d4a7da5 Merge pull request #3 from toddlipcon/master new 25608ae cleanup new ca1f11a Merge branch 'master' of github.com:Parquet/parquet-mr new bf0bd57 Update README.md new b946355 thrift enum and list fixes; ParquetReadToWrite new 5daf068 more thrift bug fixes new 668d74d exception cleanup; creation of parquet.hadoop.api package; thrift from bytes support; bug fixes new e1b2f29 Fix pom.xml new 70abea6 Merge pull request #5 from mickaellcr/fix_pom new ed7a067 move to groupId com.twitter so that we can publish to maven central new 0703f01 Merge pull request #4 from Parquet/thrift_fixes new 45138bb integrating feedback from Todd; renaming PrimitiveColumnW/R to ValuesW/R new 6b9366b renaming classes and packages based on feedback new e0dc2f3 reorganizing packages and deleting old classes new 6dff626 Merge pull request #6 from Parquet/move_to_twitter_group new 63949a3 Merge pull request #7 from Parquet/renaming_classes new ef6b02c fixed merge issue new 6e33c3d fix Filesystem access issues mentioned by Dmitriy new 5a52fe1 fix Filesystem access issues mentioned by Dmitriy new 12b99b1 metadata file in parquet format new 251855b better metadata file tests new 37ad86c removed old doc new f7ba78a Update README.md new dfd872b javadoc new 37b7041 better tests for new summary file new aa6dca8 integrate thrift format changes new dabb797 Merge pull request #8 from Parquet/metadata_file new 11c34bf Merge branch 'master' of github.com:Parquet/parquet-mr new bb7d9f6 Update README.md new bdec17e integrate Todd's feedback new 4b45e24 Merge branch 'master' of github.com:Parquet/parquet-mr new d07c10e Fix LICENSE new 7a6c784 Update README.md new a524979 Remove the old license.txt new 7bd223a improved OutputFormat javadoc and defaults new 7d90176 Merge branch 'master' of github.com:Parquet/parquet-mr new 408785a adding back the license header used by the maven plugin new 7a9a750 Update NOTICE new 68f4a5d Update README.md new c502c21 add deploy configuration new bf1b494 Merge pull request #10 from Parquet/allow_deploy new 241634e better exception when reading unknown field new de8bc0d fix java 6 compiler compatibility new 5fe97c8 add pig schema in thrift metadata new 65364a4 fix map of primitive; add thrift to pig compat new c44ff2d cleanup new ef0069e incorporate feddback; more tests new 82d5d85 Merge pull request #13 from Parquet/thrift_to_pig_compat new e93e35c fix metadata file in mr mode new df3e94a change default level encoding to bit packed; instanciate reader from page header encoding new 4828f1d fix metadata conversion new d81380a add test to ensure enums are equivalent new e087908 integrate elaphantbird 3.0.8 new 071b8f4 change ReadSupport api to fix projection support new cf6dbc3 Merge pull request #16 from Parquet/update_encodings new 5dfa1b2 deal with elephantbird handling of numbers new 696bce4 use constant for settings new f9c25b6 Merge pull request #17 from Parquet/thrift_to_pig_compat2 new a4e5f02 Update README.md new cc5cd63 Update README.md new fce6998 avoid string decoding recoding new c625aa6 Merge pull request #19 from Parquet/improve_thrift_perf new f5b2cb8 add better Bytes plain decoder new 56faa1b first stab at dict encoding new cf5ba49 fix perf test new 1b2694e fix offset new 8246e0b fix perf problem with new String(bytes, offset, length, encoding) new 5c60ed8 remove one array copy new 60a2925 Merge pull request #20 from Parquet/perf_improvement new e18d38b dictionary encoding new 827a5bc fix dictionary encoding new 4a69511 dictionary encoding new b41e6e2 relocating jackson inside the parquet-thrift jar new 90b5eae add scrooge and cascading support new 9025c87 add indirect jackson jar dep new dd057cf merge shade_jackson, fix compilation errors new 3d63626 Merge branch 'shade_jackson' of github.com:Parquet/parquet-mr into scrooge_scalding new d732183 apply jackson shading to all modules new d006b51 Merge branch 'shade_jackson' of github.com:Parquet/parquet-mr into scrooge_scalding new 8f9f0c7 integrate thrift change in format new a4f133a cleaning methods new 7025ede Merge branch 'integrate_format_changes' into dictionary_encoding new 8fc53c4 improve logging new 5d16042 Merge pull request #21 from Parquet/shade_jackson new e9aa5d5 Merge pull request #22 from Parquet/integrate_format_changes new 6864d2e Merge branch 'master' into dictionary_encoding new dc585ac Merge branch 'master' of github.com:Parquet/parquet-mr into scrooge_scalding new 9c62f41 improve dictionary new 0734217 add license headers new 9656ef2 improve api; improve logs;improve PrintFooter new 08c8f82 use published EB, fix NPE in ThriftMetaData new d5b3b7d better logging new 0bb5e93 first stab at rle encoding new 4fe082c BitPacking up to 31 bits new f7a47d3 adapt Lemire's scheme to our value ordering new f2ff9e8 add license headers new 07c56fc add notice new 4611c47 writer readers for int based packing new a0926f3 add both orders as we might want to change our encoding in the future new b0e9609 more tests and bug fixing the Bit packing new e3e8159 make things that look like closeables implement Closeable new ee8c25e Merge pull request #27 from Parquet/make_closeable new bca0411 Merge branch 'master' of github.com:Parquet/parquet-mr into scrooge_scalding new bcdddf7 implement byte based batch bit packing new c2bdea0 address comments from alex l. new 9942ce1 move simple RLE to generated bit packing new aa851eb remove broken reader/writer new 61d5170 fix bug where a required field would not be created at the right level new 8b69ad5 address comments from @J_ new 1d21646 Merge pull request #29 from Parquet/fix_definition_level_for_nested_required new f866bb8 more tests for optional vs required new ee4a1b8 Merge pull request #30 from Parquet/fix_definition_level_for_nested_required new 9d2df13 Merge branch 'master' into dictionary_encoding new d64c883 mae dictionary more generic; allow converters to understand dictionaries new feadc9e Merge pull request #24 from Parquet/scrooge_scalding new 31b91ee address review comments by @squarecog new 02b283d Initial support for Avro. new 42c38a1 Remove unchecked generics warnings. new 61a163d Honor repetitions correctly. new 568bd7f Add Binary.fromByteBuffer method. new 11bb824 Remove unnecessary level of grouping for array. new e47f3b1 Fix creation of arrays and maps in converters. new 58d8c52 Remove incorrect record initialization to compensate for broken support for nested records (not yet fixed). new e365840 Create generic Parquet reader and writer for object records. new 1605cda Avoid copying bytes if ByteBuffer is array-based. new 5121cd5 Avoid double conversion of bytes for Avro Utf8 instances. new 64c45cb Add test for nested records following fix in 61d5170844aaf611555a0dd63c5e24af08acf1c8 new 80adbcd Remove -Xlint:unchecked flag from the build for the moment as it causes CI to fail. new d3c1a34 Fix compilation with Java 6. new 34ce924 Merge pull request #26 from tomwhite/avro new a5d72a4 Merge pull request #31 from tomwhite/java6-compilation-fixes new 19e0902 make converters dictionary aware new aed56c9 integrate the new bit packing for perf new 133d845 Merge branch 'master' into rle new a3e8963 Merge branch 'rle' into dictionary_encoding new 67a3577 make field private; add braces for one line if statements new 999b214 use BytesUtils.paddedByteCountFromBits everywhere new b125dee make initial capacity a constant new 7cb782c make a constant for constant value; remove outragous System.out.println() new e548432 add new line new 4c11b66 typo new 280cea3 make the API treat empty fields the same as missing fields to avoid confusion new 1db1018 turn on validation for generate TPCH new 3f09751 making empty fields illegal new 7c0f1a6 rename fromSequence to concat new 5db276f cleanup import new 0550545 Merge branch 'master' into rle new 43dcb03 Merge branch 'master' into dictionary_encoding new 6b867e5 skeleton for an efficient converter from groups to cascading tuples new 065a3c9 working selective tuple materialization for cascading new 593a105 short class comment for the TupleScheme new 1f0a8a2 replace DeprecatedContainerInputFormat with DeprecatedParquetInputFormat, should build under MR2 new 5e82439 fix up cascading and scrooge to use DeprecatedParquetInputFormat new 20a4bf7 DeprecatedParquetInputFormat is not abstract new 2676de9 Treat Fields.UNKNOWN as Fields.ALL new a49a0e9 don't create a TaskAttemptContext in ParquetReader new 74157a0 Fixed potential Integer overflow. new fc0c7cd integrate RLE into dictionary encoding new 7cb711c Merge branch 'rle' into dictionary_encoding new 4a8913e update git ignore new 249e889 Use a simpler serialization for cascading Fields to be compatible with older cascading versions new 8cb82ee javadoc new c96e794 + needs space new d6e3866 Merge pull request #36 from 0xh3x/master new d822ef5 Merge pull request #35 from avibryant/deprecated-mr2 new 30b461e skeleton for an efficient converter from groups to cascading tuples new 62df123 working selective tuple materialization for cascading new 2f0a779 short class comment for the TupleScheme new 1ee87d8 Treat Fields.UNKNOWN as Fields.ALL new f2ab7a2 Use a simpler serialization for cascading Fields to be compatible with older cascading versions new ffebada update ParquetTupleScheme to use DeprecatedParquetInputFormat new 9222396 merge new 3c96e97 Merge pull request #33 from Parquet/handle_empty_fields_as_nulls new 634cb77 fix bug when printing a ByteBuffer based binary would consume the buffer new aaa58d3 code formating and license headers new 3dbccba add git hash in jar new 9ec3565 Merge pull request #39 from Parquet/add_git_hash_in_jar new 6855178 Update README.md new 2d4be43 Merge pull request #25 from Parquet/rle new 59f4b10 Use the standard readFooters in ParquetTupleScheme new 88690f9 Fix Avro Read/Write support to work with the union-null optional value pattern new 5ed6162 mvn license:headers new c7ebfbb Fixes based on Julian's feedback new f3ee0c9 Merge pull request #37 from avibryant/cascading-tuples new cec7b39 Merge pull request #41 from jwills/avro-null-unions new a1fbcfb make total size include header size new 75ead0a turn LOGs back to INFO new 6c26ece Merge pull request #42 from Parquet/make_total_size_include_headers new f5ab5eb better error message when schema is unknown new 6c1ccb7 Merge pull request #45 from Parquet/no_schema_error new 0c25038 read version information from META-INF new a20750a Replace JobContext#getConfiguration calls with reflective call. new a67ea4e add test for hadoop2 new 5ab0918 update dependencies to hadoop-client new fae4c56 Merge pull request #32 from tomwhite/hadoop2 new 647825b Changed two utility classes to public new 09a54d6 Rolled back one of the public classes new 3e4561b Merge pull request #46 from laserson/public_utils new 45b893c add maven-jar-plugin version new 306868d Merge branch 'master' into version new cd25359 add library version to metadata new be136d2 Update README.md new f0c42ca Merge pull request #49 from Parquet/version new e1aa798 improve memory consumption in write new b30d7fe reduce rep and def level buffer size. 8MB * 2 * #cols is way too much new f334966 Merge pull request #50 from Parquet/scale_down_overly_enthusiastic_buffer_size new 90aca3c Merge branch 'master' into improve_mem_usage_in_write new a0e82a8 add improved memory management in hadoop layer new 922f6c5 add setting to turn dictionary on new 6a4f8d0 handle case when value is bigger than slab size new 0347e7b adjust initial column size new f7e7dd7 more unit tests for CapacityByteArrayOutputStream new d636d60 Allow setting compressor, block/page size for ParquetWriter new 11d8e0e Added javadocs new 21bb59d Oops, forgot about default compression new 34a8fb0 Propagated default sizes to the OutputFormat new 0add8d8 add constant and override annotations new 7f7fa72 Merge pull request #48 from laserson/choose_compression new d0cc3a9 check initial size new ea3eb7b Removed getCounter for compatibility new a53dde1 javadoc and constants new 34d0b5d Merge pull request #52 from laserson/getCounter_compat new 5b25bb5 add constants and doc new 4f4c5c4 Merge pull request #51 from Parquet/improve_mem_usage_in_write new b6d1cb0 add a validation setting to OutputFormat new 05f103b Fix bug that prevented writing optional Avro records, arrays or maps new 427137d Merge pull request #54 from massie/master new a5b478e standard ordering of keywords new 19b369b use checkNotNull new e85b3c2 interfaces have public members new df1ab6f introduce contants new 70d3eeb better javadoc new f9784bf remove unnecessary keywords new 25d8ff2 javadoc and cleanup new 5c2034a license headers new 38d8a68 Merge branch 'master' of github.com:Parquet/parquet-mr into dictionary_encoding new 5cedb4a license headers new 9a30c8f Merge pull request #40 from Parquet/dictionary_encoding new 0a7c59a Speed up Avro string parsing new b3e9432 First pass at RLE hybrid new 0f9eee5 Cleanup new 2ffdab7 Add test for setByte() new d29eeb5 cleanup preconditions new 27d1c2c Add javadoc to ByteUtils new c71cb25 Merge pull request #55 from laserson/fastavro new 71bd4e2 bit packing support for LE new cd3a123 Start tests for rle hybrid new c9c1080 Add bit packing overflow test new 24f6524 End to end test new 65fb8d3 move unpack new 63ed719 Fixup / rename RLEDecoder, fix tests new 99853be Merge branch 'master' into alexlevenson/RLE-bit-packing-hybrid new ded5bec cleanup comments new 14e0574 Merge pull request #57 from Parquet/bit_packing_lsb_first new abb6e36 Address first round of comments new cd6a02d Merge branch 'master' into alexlevenson/RLE-bit-packing-hybrid new d7fe1a5 Use RLE for repetition / definition levels new 47824ef Merge branch 'master' of github.com:Parquet/parquet-mr into add_validation_setting new af0ccf9 better error message and javadoc new c211582 Merge pull request #53 from Parquet/add_validation_setting new aed1dca dictionary encoding header is now bitWidth instead of max dictionary entry id new 7cf85cb Fix RunLengthBitPackingHybridValuesReader new e70652e Merge pull request #59 from Parquet/dictionary_encoding_format_adjustment new 2dbd0d2 Remove logic for valueCount > Integer.MAX_VALUE new 3ff63fd Merge branch 'master' into alexlevenson/RLE-bit-packing-hybrid new 6e65166 fix bit packing encoding bug new 1d13a61 create and use checkedCast() new b7f6946 Merge pull request #58 from Parquet/alexlevenson/RLE-bit-packing-hybrid new e0a5920 Merge branch 'master' of github.com:Parquet/parquet-mr into fix_bit_packing_encoding_bug new 7e3ef9f Merge pull request #60 from Parquet/fix_bit_packing_encoding_bug new c773446 ability to read version number from parquet jar Version utility new a2b7a65 when there is more than one row group the converter will get multiple dictionaries set new c3596a9 Add support for 4 byte length written at the beginning of rle columns new c4c77ba add support for ReadSupport specific info in split new 4950149 Merge pull request #61 from aniket486/master new e27f871 Merge pull request #63 from Parquet/alexlevenson/fix-rle-4byte-length new f7fbed1 fix comments new 04ac202 Merge pull request #62 from Parquet/fix_dic_decoding_bug new 1a30dcc Merge pull request #66 from Parquet/alexlevenson/fix-rle-comments new ff109bc Merge pull request #65 from Parquet/ReadSupport_specific_info_in_split new 5f0f929 Added filtering functionality new ef5c143 Added avro specific functionality new 61239a0 Added avro specific functionality new 48bb48e Added avro specific functionality new f3ed65b fix ValueStat max value new 8285b62 Fixed bug querying on Name,Url new ac5cbd1 Implmented more efficient skip algorithm new 483dd9f Merge pull request #67 from svzdvd/master new e440108 Add support for snappy compression. new 7b74290 Fixed test case. new 2519b95 Merge pull request #70 from Parquet/snappy new be8e4e9 Merge remote-tracking branch 'upstream/master' new 1d7a5c3 Fixing after code reviews new e2e8edb small updates to README with feature matrix and other improvements new 6a0dffd fix readme links new 08b7aeb support for schema compatibility new 7b47079 Should not write data if the RL/DL is all zeroes new 29ba92c add a bit more pig detail new c98f467 Merge pull request #73 from aniket486/write_no_data_for_no_RL_DL new 94afb19 fix dictionary decoding bug when more than one encoding is used new 96bf78d Merge pull request #71 from Parquet/update_readme new 707bdf0 add negative tests new 364b4a0 Merge pull request #72 from Parquet/schema_compatibility new efc5982 fix schema compat new 18122b4 Merge branch 'master' into schema_compatibility new 3afe647 Merge pull request #75 from Parquet/schema_compatibility new 992a47e fix call to converter new 5fc6728 Merge pull request #74 from Parquet/fix_dictionary_decoding new c4b14fb Adding APL headers and test for union schema creation. new f52a26e Renamed checkValueRead. new 80a449d Updated from master new caa8e51 split Plain reader so that the reader knows what type it's reading new 653a4cf collapse small classes into one class new 89d6b17 refactored bit packing new f1da1c7 Merge pull request #79 from Parquet/split_plain_reader new 806b548 fix for schema compatibility new 33c8dc5 Merge pull request #81 from Parquet/fix_schema_compatibility new 43dc88d adding projection support for thrift types new adb46b6 make splits report actual length new 9a5d597 Merge pull request #83 from atkeano/thrift_read_projections new 6bf0597 Merge pull request #80 from Parquet/encodings new 6030c9e Merge pull request #84 from Parquet/splits_report_actual_length new 4de2744 fix bad merge new 79a3106 reduce memory usage of metadata new fc56631 Initial checkin for load pushdown new 5214a65 minor fixes and refactor new 64814a6 make fields final new 80e72a5 Merge pull request #85 from Parquet/reduce_memory_usage_of_metadata new feecf58 adding tests and removing comments new ef2aa8e Merge branch 'master' into filtered_reader new 7e31eec Merge pull request #89 from Parquet/filtered_reader new 3b9e6d8 Update README.md new 91fa2c4 Update README.md new d8e6ba3 added code review changes new 5a5bb7f initial commit for recursive listing new 3921742 small fixes for hadoop2 failure new 61f2c86 add buffer to protocol pipe new c2ad764 Merge pull request #90 from aniket486/list_recursive new 9f41d31 Merge pull request #86 from aniket486/load_pushdown new 02d5ed2 fix merge conflict new aa0bc13 Add Avro specific support to AvroParquet{Input,Output}Format new de0d0cb Merge pull request #94 from massie/avro-specific new 3f19ce3 reduce size of splits new dd20df1 Merge pull request #87 from Parquet/reduce_size_of_split new f46bdaf Merge pull request #93 from Parquet/add_buffer_to_protocol_pipe_in_master new a80029e try github site integration new aedcdde fix pom conf for github pages new 4643a5c add coverage report to site new 6b5b8b2 improve memory usage of metadata new 54ac6b4 license headers new f7d0987 fix compilation issue with 1.6 new 964e5da Add support for schema projection in Avro new 62c3155 Merge pull request #96 from massie/master new c1d67ee Add support for predicate pushdown in ParquetInputFormat new 435b13b Merge pull request #98 from massie/matt-pushdown new 8c1032d Merge pull request #97 from Parquet/improve_memory_usage_of_metadata new c7a8eaf Start implementation parquet for hive : new 2471e51 Remove any K,V from DeprecatedXXFormat new fcc88f3 Add support for CombineHiveInputFormat new f5ca27b Improve column reading new 4fe18c5 Can write complex types new b4d1c71 Can read some complex types new 07e54a5 rename one parameter new 1ada3d2 Some improvements on the hive implementation : new ebf76d4 Indentation : retab to 2 spaces. Nothing else. new 126da3c Give selected columns to ParquetInputFormat : Done new 910e7cd Add equals and hashcode methods to BinaryWritable new 7f534d5 Implement a basic version of the SerDeStats object for ParquetHiveSerDe new 6c219a5 Fix Short object for Hive (use short for short instead of byte :)) new bd826ec Add a simple unit test for ParquetSerDe new 33e0131 Add some unit test in order to test : new 1dc42f0 Improve the pull request following advices from Julien new 74be528 Add full support for array and map reading new d615729 Add unit test for storage new f2d9e81 update hadoop version new 4198153 Fix compile with abstract methods new cc6754c Fix CombineHive bug new cee774c Fix more combine stuff new 174c26a Correct fix to CombineHive new 3ee49a5 Change MapWritable to ArrayWritable (perfomance improved !) new 8234945 Remove unused parameters new 543cdc2 Fix the size of the value array new bb6e2ff Hadoop 2.0 compatibility, hive 0.10 new 0ec089d Add metadata in ReadContext instead of Split new 82fff8c Clean up ReadSupport init new 3089e83 Manage count 0 new 21b0d97 Update with advices from Julien new eccbba1 Improve speed for queries like count(0), in which we only need the number of lines new bf7263b Try to fix travis build new 26360f3 Minor changes new 2525587 Update getSplits in DeprecatedParquetInputFormat new 4cf6ae1 Code review new ed9eec8 Add "how to contribute" to README.md new a2ec5cb Merge pull request #102 from Parquet/how_to_contribute new 61f2260 Merge pull request #95 from Parquet/doc new 92c450a Merge pull request #28 from mickaellcr/parquet-hive new 273ecd4 Update Hive support status new 67b8423 ThriftParquetReader and ThriftParquetWriter new be49204 change default page size and add some doc new 0fbd026 Merge pull request #105 from Parquet/ThriftParquetWriter new b7fe532 Merge branch 'master' of github.com:Parquet/parquet-mr into change_default_block_size new 8a62bb3 fix doc new b8576b9 Update README.md new 4bc2433 [fix validation script] when boolean value is null, set it to 0 for being compatible. new 265ef24 Merge branch 'master' of https://github.com/Parquet/parquet-mr into fix_boolean_default_value_for_tuple_converter new 0e8f1f7 Merge pull request #107 from Parquet/change_default_block_size new eac5aec 1. return compatible schema when compatible flag is set. 2. tupleConverter set to return IntegerConverter when flag is set new c4e8d26 optimize code format, add log info to indicate boolean will be convert to int when compatible mode is on new 7a4b562 add if debug statements to parquetloader new a7c42f9 Make writer independent new 504833e Make reader independent new 7d1fe78 remove space, add braces for readability new 82579c9 Merge pull request #108 from Parquet/elephant_bird_compatible new 09ba2fb Merge pull request #111 from aniket486/master new 3c55f58 Merge pull request #112 from tomwhite/issue-64-mr-indept new a6e3fe7 upgrading scrooge runtime version to 3.1.1 new 2289141 Merge pull request #113 from aniket486/master new 6218072 move github site to profile new bb74d3d add description new 6eec81d [maven-release-plugin] prepare release parquet-1.0.0 new 78481ad [maven-release-plugin] prepare for next development iteration new aca5615 Update README.md new ac0caa0 Update README.md new ecb2dac refactro column reader new f8dd208 simplify end of page count new aa530c1 Merge branch 'master' of github.com:Parquet/parquet-mr into refactor_column_reader new 26932a5 Minor README.md fix new c126179 remove raw type for ParquetTbaseScheme to support thrift0.5; remove scalding dependency new 2a32f34 Merge pull request #119 from Parquet/thrift_05_compatible new 86ae4f8 fix wrong converter: use TBaseRecordConverter for ParquetTBaseScheme; Add unit test for getting correct record converter new a54414c use Mockito to mock varibles in test, fix format and variable name new 5a25926 Merge pull request #121 from Parquet/fix_wrong_record_converter_class new 6232daf fix javadoc new 1fc0698 Fix RLE bug with partial literal groups at end of stream. new a9e2c7d Fix Short and Byte types in Hive SerDe. new 35c6dc6 removing github-pages-site target before releasing new 8ffe30d [maven-release-plugin] prepare release parquet-1.0.1 new 05c73c4 [maven-release-plugin] prepare for next development iteration new f17b83c added unit tests for parquet cascading new 7e16d31 better format new f327ecf format new e4269e6 remove blank lines new 3a3b73a Merge pull request #126 from Parquet/unit_tests_for_parquet_cascading new 3c86936 Merge pull request #120 from Parquet/rle_fix new 8d66611 adding dictionary encoding for long,double,int,float new 28da58c Fix Snappy compressor in parquet-hadoop. new 3934855 update plugin versions for maven aether migration - fixes #125 new 3fad4bd Merge pull request #133 from atkeano/maven_build_errors new 87228eb refactoring dictionary encoding for non string types after comments #127 new 12bc29a split out method to facilitate the inliner job new 99673f2 Merge pull request #127 from atkeano/dictionary_encodings new ce8c1a4 Merge pull request #118 from Parquet/refactor_column_reader new af45d9c fix bug of wrong column metadata size new 3fb938b Merge pull request #138 from Parquet/fix_column_metadata_size new 6ff0264 Implemented partial schema for GroupReadSupport new c9b213c Merge pull request #123 from Parquet/snappy_codec new a274684 added 3 counters to parquet for benchmarking bytes read and time spent new 43ad5e5 fix test new 2cc9321 add test for no benchmark counters new ccf32c7 remove comments new ea62ffe add unit test new 9394c09 fix test new c269726 formatting new 4288aa6 formatting new d4ef8d9 Merge pull request #140 from Parquet/partial_schema_for_group_read_support new 1323e4f Merge branch 'master' of https://github.com/Parquet/parquet-mr into hraven_counters new 397b4c9 formatting new f8e2658 fix incrementCounter getConfiguration method to support 2.0 new 3bebb9a fix new 35c419c remove public Constants new cb6d3d2 Merge pull request #141 from Parquet/hraven_counters new 9ef85a0 merge new 42ad701 fix test file path new 4d1b3e0 add unit test new 09bfe99 Merge pull request #124 from Parquet/serde new 92ce68d fixed new 77bede5 add test new fb67069 fix space format new 37bd05d Merge pull request #142 from Parquet/fix_total_size_row_group new f61a123 Merge branch 'master' of https://github.com/Parquet/parquet-mr into fix_empty_encoding_col_metadata new d2878f8 Merge pull request #143 from Parquet/fix_empty_encoding_col_metadata new 8f93adf Map key fields should allow other types than strings new 9adb8e2 code review changes new c22a357 Merge pull request #144 from aniket486/map_support new b500681 add getStatistics method to parquetloader new e5b767a Add some nested type tests and fix Map handling new 8cc147b Add map and list to in/outputformat unit tests new c76880d Merge pull request #146 from Parquet/hive_nested_types new 808a90d changing default block size to 128mb new 4202efb Merge pull request #149 from aniket486/change_block_size new aab7b4b code review comments for stats new 4b4fb0e Merge pull request #145 from aniket486/stats_loader new bee8378 [maven-release-plugin] prepare release parquet-1.1.0 new 62cc2c2 [maven-release-plugin] prepare for next development iteration new bbae83d Create CHANGES.md new 7b2ef26 add thrift validation on read new 8a8354b add better error message new 71a6d88 add better error message new c8ba085 Merge pull request #150 from Parquet/add_thrift_validation new 945d1bd [maven-release-plugin] prepare release parquet-1.1.1 new 0784dc9 [maven-release-plugin] prepare for next development iteration new 91c1711 fix projection on required fields and refactored unit tests for column IO new 69ef1f4 fix file path new 413418e added release 1.1.1 new c023d63 improve thrift error message new bde6493 Merge pull request #153 from Parquet/fix_projection_required_field new ec46329 use globbing syntax to specify manual pushdown in ThriftReadSupport new 17e2511 Merge branch 'master' of https://github.com/Parquet/parquet-mr into manual_pushdown_for_thrift_read_support new 30359b4 remove TODOs and fix format new 3553c02 add site target to update-github-site profile new 874e470 indent fix, remove tabs new 57b1e0f Merge pull request #156 from Parquet/fix_site new 459a8a1 make counter works in DeprecatedInputFormat, which is used by cascading new d20e5f2 fix tests new 00065cd change filter key name to parquet.thrift.column.filter, remove extra filter parameter from ThriftSchemaConverter new bb06859 add license and comment new e371fbb add comment new 6dfd975 Merge pull request #159 from Parquet/counter_for_mapred new 6b96924 Merge pull request #155 from Parquet/manual_pushdown_for_thrift_read_support new b045ac1 Resource leak in parquet.hadoop.ParquetFileReader.readFooter(Configuration, FileStatus) new 848fa8e support schema evolution new c32be9e thrift schema evolution support new f369a13 validate output new f0d30df refactor schema converter new 8faaaf0 support projection on only key of a map new 2043741 add thrift idl for testing new 7f08eee add license headers new 1a711f0 turn off projection from scrooge new eb15665 Add test cases for reading/writing Avro records with empty arrays and maps. new 50feb33 Fix tests for reading and writing Avro records with empty arrays and maps. new 0cedaf2 remove debugging code from hot path new 9594bba address review comments new 6f25a0f Correctly handle Avro records with empty maps and arrays. new 26d09f4 javadoc new 8c51a42 better error message new 8fa09f0 Merge pull request #163 from Parquet/thrift_perf new 80c3a2a Merge branch 'master' into schema_evolution new 60a3468 fix test new cfc91fc almost there... now working on not to use thrift class, so it's compatible with scrooge new 00a5d5b migrated to using ThriftStruct for schemaConverter, do not use thriftClass new 7387a61 passed all test, fix map, removed tests for pull in required fields new 147a3f0 start! do not check required field, failing test new 55c14bf javadoc new e5cb3c8 fix test new e9f2550 parameterize dictionary new 12d1ac4 fix noisy warning new 47116ad fix schema merging new d4dbf0b Merge pull request #160 from adityakishore/master new 05a0106 Merge pull request #161 from Parquet/schema_evolution new 5b36d9c make buffered by default new 2de2567 Merge pull request #154 from Parquet/add_thrift_validation new 4170539 [maven-release-plugin] prepare release parquet-1.2.0 new c4515db [maven-release-plugin] prepare for next development iteration new b3efce2 fix compilation problems new 3c274d1 Merge branch 'master' into fix_avro_empty_maps_arrays new 70226b9 distinguish recoverable errors new 68fa6cd fill in missing fields, only for str now, will refactor to visitor pattern new 7dfa864 visitor pattern for string, test passed new 1bf9d5f extracted inner classes from ProtocolEventsGenerator new c59e82f implemented all dummy values new 9a1e295 merge master new 6f374b7 store thriftType in converter[fix merge error] new 3e160d9 add license headers new b4a8eb1 fix test new 3060f85 added unit tests new 8aadc0a fix bug, use a new list for fixed events new b812389 inline some classes new f702fdf better naming new 133b252 add missing file new 073e202 fix test path new eff7237 remove converted Type new d7b0083 Add empty map and array to test Avro schema all-minus-fixed and add empty map and array fields to parquet-avro test that tests fields of all (except fixed) types. new 365d84e prepare for commit, remove format diff new 5ca7671 Re-enable test for fixed type fields in Avro TestReadWrite. new ffbdf6d visitor pattern for schemaConverter new 2b2837f refactor matching filter new 079e295 rename new 7b68b47 sucess: compile scrooge generated classes in parquet-thrift new 5393833 migrated tests to parquet-scrooge [tests passed] new d9ce726 add test in scrooge [only maven passed] new 0570f46 better fallback mechanism new 015ed30 fix oom error dues to bad estimation new 1e472d2 Merge pull request #167 from Parquet/fix_oom new 92f58ac [maven-release-plugin] prepare release parquet-1.2.1 new dae37cf [maven-release-plugin] prepare for next development iteration new 1d92804 Add typeLength to ColumnDescriptor. new a7ba48b created ScroogeSchemaConverter new e2d3bb2 [style]fix if...else in ConversionPatterns new 64e6d82 [style] add spaces around = new 99b6dfc remove julien's TODO new bbc0aa7 Merge pull request #166 from Parquet/avoid_pruning_required_fields new 9d84697 merge master new f6f3eaa broken tests for scroogeRead new 3803d2d Plumb FIXED type length from Avro schema through to Parquet metadata. new b4c45d3 fix bug, missing break in thriftSchemaConverter new dcc0d81 Merge branch 'master' into fixed_len_byte_array new 04784c2 test pass new ceef971 remove unused compat.thrift new 2a2696d format new 78b3f86 update scrooge denepdency, add unit tests for reading in scrooge new 249581d add TestCase for scrooge schema converter new c1f3512 change some ParquetOutputFormat interfaces to mirror ParquetInputFormat (and be useful for writing a DeprecatedOutputFormat) new ce6bfcc add a DeprecatedParquetOutputFormat to mirror DeprecatedParquetInputFormat new 7adc264 add another getRecordWriter overload new 9ae1d88 add Sink functionality to parquet.cascading.ParquetTBaseScheme new a00fd5e remove tests that check that TBaseScheme doesn't support writes new 521d081 add some convenience methods (from ParquetOutputFormat) new 93d6770 add a simple test for DeprecatedOutputFormat new 34bbb90 missing copyright notice new bba9775 two unused imports new 3c65205 field requirement depends on if the getter returns option new e2fec1c add optional map field to thrift file new 234a1cb extracted key and value type from map and optional map new 9a5eea0 working on map new d32e65e downgrade scrooge version to 3.6.0, which is the latest version on maven central new dd02df0 added map test new 8b84a9e specify scala version for scrooge new 5aa7a68 accidentally deleted a space new b11e2a0 Plumb type_length for FIXED types through to reading pages. new 4e82ab6 Add methods to write fixed Binary without prepending length. new 1f63013 Added two boolean options for record filters. new 58051d0 Added functionality to allow users to implement functions to be used as predicates. new 232d521 use class.getName new 038a400 update scrooge to 3.8.0 new 0f9e39b Merge pull request #165 from Parquet/distinguish_recoverable_exception new f8ac0f0 Initial end-to-end write and read support for Avro FIXED fields without runtime exceptions, but still with data representation issues. new 4f1493b Fix broken tests. Test failures encountered previously were due to broken tests. new 08b45b0 Add fixed_len_byte_array to oneOfEach in TestColumnIO. new 201d80c Merge branch 'fixed_len_byte_array' of https://github.com/davidzchen/parquet-mr into fixed_len_byte_array new e8c2a39 better log messages new 562e811 make binary dictionary encoding use fastutils; fix tests new f98cd39 shade fastutil and keep only used classes new 1c9c19c Merge pull request #171 from Parquet/scrooge_tests new bce04eb add 1.2.0 and 1.2.1 new 8305bdb Add FixedBinary type by creating a wrapper class around Binary and plumb FixedBinary through for read and write support for FIXED_LEN_BYTE_ARRAY. Undo change to add FIXED field to oneOfEach schema for parquet-column TestColumnIO for now. new a48f56f Re-add FIXED_LEN_BYTE_ARRAY to oneOfEach and plumb through FIXED support for example Group. Test still fails and need to solve read issues. new 1c99a11 rename variables for readability new 108509f Merge pull request #173 from Parquet/better_log_messages new bbf3448 add overloaded getFooConfiguration(JobContext) methods to ParquetOutputFormat new 310e551 throw the writeSupportClass as part of the exception message if instantiation fails new 20f3f46 continue renaming new 6da7594 fix problem with projection pushdown in parquetloader new cc59cb8 Use ValuesWriter and ValuesReader specific to FIXED_LEN_BYTE_ARRAY rather than overloading on a FixedBinary class. new 3e02a84 Merge branch 'master' into fixed_len_byte_array new e37cd2b Add fixed field to parquet-avro TestSpecificReadWrite. new ebe07c6 changes as per code review comments new a73e73c changes as per code review comments for test new 4694001 Merge pull request #175 from Parquet/pig_projection_pushdown new 51d3332 Merge pull request #174 from Parquet/readability new b7a39c5 [maven-release-plugin] prepare release parquet-1.2.2 new 0688404 [maven-release-plugin] prepare for next development iteration new 3b5d32b Add new Vin field to Avro TestSpecificInputOutputFormat. new e6ebda9 Update CHANGES.md new bc31bda Update CHANGES.md new e6fab06 Document why FIXED_LEN_BYTE_ARRAY is not supported with Avro specific schema right now. new e242085 add an empty constructor for ParquetTBaseScheme (which only works for reads) new 3105009 add read and write tests for ParquetTBaseScheme new 458dd70 remove redundant test new 3b7359b Re-enable tests for writing FIXED for Avro Specific records. Preliminary end-to-end for writing FIXED but write is still not completely correct yet. new 8e4278b missing test resources new 1b326b7 Fix reflection for converting fixed Binary to Avro SpecificFixed. Ensure that FIXED values are written using the FLBA PlainValuesReader when dictionary is enabled. new 12d41ac De-fluffify inadvertently added whitespace changes. new e31d46c merge master new a79eab7 Complete support for supporting FIXED_LEN_BYTE_ARRAY for Avro SpecificRecord. Add syntax to specify type length for FLBA type fields to MessageTypeParser. new 24d7267 Remove print statements. new 976c68a Merge branch 'master' into fixed_len_byte_array new 5e0dba7 upgrade scrooge to 3.9 new ca7da65 basic support for map new 6a6613f tests all primitive key types in map new 49f3ad1 Add support for reading FIXED_LEN_BYTE_ARRAY to Pig support. new 5ab6ccc Merge pull request #169 from davidzchen/fix_avro_empty_maps_arrays new 0822e32 add unit test for primitive value for maps new 9cd6737 Add comments to new files. new d9ced33 test optional map new f45b384 convert list and unit tests new 5410381 convert set and unit tests new d0fc6a0 Correct schema syntaxes for TestHiveSchemaConverter. new 6ffc1b9 implemented conversion for enum new bfcb120 implemented map with nested structure, TODO: tests failing since the default requirement can not be determined new 835f12e refactor code new 4b2cb26 Added unit tests for predicates. Got predicates compiling, and passing on tests. new 1ee4232 Merge changes from master that fix handling empty Avro arrays and maps. new 52c32a3 Removing predicate functions to prepare for pushing or/not filters. Limits number of features pushed. new b49018b Merging in changes from main repository that I have forked from to minimize work after pull request. new 2c5e07f Add support to AvroWriteSupport for writing out records with maps containing Utf8-type keys. new 3778e45 Pulling in clean modifications for adding ColumnPredicate functions. new a2895bf Remove print statement. new 0ad94f9 Fix a maven warning about a missing version number. new 6ec199d Disable the time read counter check in DeprecatedInputFormatTest. new 7802a9a Update ParquetReader to take Configuration as a constructor argument. new bb52d33 Merge pull request #179 from fnothaft/master new a258aae Merge pull request #182 from wesleypeck/fix_maven_version new a146ebb Merge pull request #183 from wesleypeck/fix_timeread new e20c490 Merge pull request #184 from wesleypeck/parquet_reader_projection new 753473c Change syntax for fixed_len_byte_array to placing length parameter after type name rather after field name. new 9aad641 Move reflection checks for specific Avro Fixed type into FieldFixedConverter constructor. new a8d99d9 add an assertion to check the output created by reading with ParquetTBaseScheme new 2ffde6a Merge pull request #172 from colinmarc/cascading-tbase-write-support new d24f4a7 fix new 4e6863a add checker new 6d9d2b3 add test new f325418 generate_json new 5bd87b1 check compatible new 8856e45 todos new bd311f5 fix_test new 001e3de map checker new 5bf6127 SetChecker new 5427f44 list checker new 2fb1f7d accept visitor new 64b2f72 fix new 0d39e1c add compatibility report new 6f0f236 fix tests new 2b62da3 requirement check new 2731c0f refactor tests new 2ed9b50 fail when required field is added new fdb0725 add tests for list set map new 9652d97 add parquet-pig-bundle new e3292ed Merge branch 'master' of github.com:Parquet/parquet-mr into pig_BUNDLE new ff4d13a add null check new 5adf79f Merge pull request #180 from davidzchen/fix_avro_utf8_map_keys new 7247538 compatibility runner print more detailed info new b3b0bbb fix indent new 1e61b40 fix version new 7c6ba3e Merge changes from master. new 989e9dc Plumb OriginalType through to ConvertedType in file in ParquetMetadataConverter. new e40dcfb Merge pull request #181 from davidzchen/fixed_len_byte_array new 5303197 Merge changes from master. new c1ac1af Merge pull request #186 from Parquet/pig_BUNDLE new 1ae6772 add dummy file to generate source jar new b168fd1 [maven-release-plugin] prepare release parquet-1.2.3 new 1326c00 [maven-release-plugin] prepare for next development iteration new fd3b05c release 1.2.3 new cfd63fd Fixed issue with test case that was causing runtime error. Was trying to call getInteger on long... new 308c1b4 Added two boolean options for record filters. new eb35ba8 Added functionality to allow users to implement functions to be used as predicates. new c6a4d18 Added unit tests for predicates. Got predicates compiling, and passing on tests. new 8be341f Removing predicate functions to prepare for pushing or/not filters. Limits number of features pushed. new 64921da Merge branch 'master' of github.com:fnothaft/parquet-mr new 3edf60d Manually merged in conflicts in TestFiltered.java. new 0a36e35 Fixes #189: NPE in DictionaryValuesWriter. new fd2935b Merge branch 'master' into plumb_original_type new 714335d compare json new 0d25979 Merge branch 'master' of https://github.com/Parquet/parquet-mr into compatibility_checker new 820fb75 remove unused test new dc425e4 show field name when they are not compatible new 5cad37b remove unused command from CompatibilityRunner, add comment for rules used in compatibility checking, add license header new 82052b8 Merge pull request #191 from Parquet/compatibility_checker new 09bcb1b fix comment new eff1e5f Merge pull request #192 from Parquet/comment_fix new da05b13 [maven-release-plugin] prepare release parquet-1.2.4 new 6dc4732 [maven-release-plugin] prepare for next development iteration new ff55567 Update CHANGES.md new 04ad0c4 Merge pull request #190 from wesleypeck/fix_dvw_npe new 2020142 refactor serde to remove some unecessary boxing and include dictionary awareness new 50bd144 Merge pull request #194 from Parquet/hive_perf new 763dfde Fix for columns list missing from the conf new 2660598 Update README.md new 422dfe0 Updated files to add applyFunctionToBinary, and add specific interfaces for primitive types. new 82f882f Merge branch 'master' into plumb_original_type new 10f266a Cleaning method signature for binary case. new 73c8629 Misunderstood previous comment. Fixed binary predicate. new cf0ee72 Merge pull request #188 from fnothaft/master new 256a3a1 Fix issue 193: RLE decoder reading past the end of the stream. new d4eeecc Merge pull request #197 from Parquet/issue-193 new d9e5f0b Implement correctly Settable inspectors new c5f68c5 Extract primitive inspectors and instantiate them only once new 4bdaec0 Fix #177: Inspect key when accessing maps new 9064500 Add some javadoc to clarify new 22cf7fe Inspect keys only for a few types in parquet hive maps new c9146a6 Merge pull request #196 from Parquet/hive_fixes new a991eff Merge branch 'master' into dictionary_changes new 0a76cc2 Fix #198: simplify TupleWriteSupport constructor new f52d35b Merge pull request #164 from Parquet/dictionary_changes new 5601394 make static field final new a905704 Merge pull request #199 from Parquet/simplify_tuple_write new a736c62 Merge branch 'master' into plumb_original_type new 9435918 Fix requested schema when recreating splits in hive new 005bc68 add null check for EnumWriteProtocol new 8edc893 Initial commit new 5c46f05 initial commit new dfa27b4 Merge branch 'master' of https://github.com/lukasnalezenec/parquet-protobuf new dd536a4 Delete todo.txt new ab4cb69 Make the ParquetLoader.inputFormatCache HashMap a WeakHashMap in order to free memory for long running processes that do not leverage caching new 5f29f43 use a new string in order to enforce weak reference on the key new 93780e0 use a new string in order to enforce weak reference on the key new cf1f442 use a new string in order to enforce weak reference on the key new 21efc9b use cascading 2.2.0 new 8edc102 throw ParquetEncodingException new a148e16 Merge pull request #205 from fs111/master new be83477 Merge pull request #203 from Parquet/check_null_for_enum_write_protocol new 3b63d13 fix comment new c5ce1fa fix comment, remove size new a34df07 Merge pull request #204 from aaghevli/ParquetLoader.inputFormatCache new 4649453 Merge branch 'master' into plumb_original_type new e337bd2 Protobuf conversion over Java types new a6aa8f7 [maven-release-plugin] prepare release parquet-1.2.5 new 9c5bdb6 [maven-release-plugin] prepare for next development iteration new 170797f Create PoweredBy new 9a91418 Create PoweredBy.md new 05c0f06 Delete PoweredBy new 6f1e812 Update PoweredBy.md new 75a6102 Update PoweredBy.md new 12a1cd7 Update PoweredBy.md new 70a433b Update PoweredBy.md new 81a1af0 improve fallback for IntDictionaryWriter new 1a4bb6a Merge pull request #206 from Parquet/PoweredBy new 11f30fa fix bug, add rawDataByteSize for dictionaryValuesWriter to decide if fall back to Plain encoding or not new be6a4ae fix bug: reverse dictionary lookup for fallbacking to plain encoding new 4d55b59 improve fallback for float new 1afdf14 minor fix, the length used in RLEValuesReader new d942b45 format new 55a451c Merge branch 'master' of https://github.com/lukasnalezenec/parquet-protobuf new 3c99aa3 improve fallback for double new b84e272 Merge pull request #207 from Parquet/fix_offset new 245d43e use primitve array for int, float , double, get rid of auto boxing,unboxing new c9b768f improve long fallback new bee6755 improve binary fallback new d33aa40 bug fix: separate fallBackDictionaryEncodedData to a method, will always be called when fallbacking to plainEncoding new 1278cde remove unused import new 65ca5ed Copyrights in converters new 492da11 remove hash lookup and unused comments new edfd7d9 return raw data size as bufferSize in dictionaryValuesWriter new 66900aa more comment new 7427a89 revert fixing page cutting, fix bug, raw data size should be long new 198f554 revert revert.. use rawDataByteSize as buffered size in DictionaryValuesWriter new a7de264 Specification of written protobuffer class in output format new 2e78704 Code cleanup new 090a2a4 Code Cleanup new 1bec97f Projections in read support new 40ae3fb artifact version changed to 1.2.5, unused dependencies removed. new 0d47734 Add test on DeprecatedParquetInputFormat.getSplit() new f479aee Merge pull request #208 from Parquet/improve_dic_fall_back new 7c2785f Merge pull request #202 from Parquet/hive_requested_schema new 6e90041 Merge branch 'master' into plumb_original_type new 59bd08b One of the constructors in ParquetWriter ignores the enable dictionary and validating flags. new 4849155 Merge pull request #210 from wesleypeck/fixwriter new 402e96d Wrong merge new 4daff70 group parquet-format version in one property new 8af5a22 Fix Binary.equals(). new 0b3400a Merge pull request #215 from Parquet/binary_equals new 31aaa53 Merge pull request #213 from aniket486/parquet_format_pom_refactor new f4d6e17 Merge branch 'master' into plumb_original_type new 0334948 parquet-hive should ship and uber jar new bbfa4ac Address comments on pull request new 700c223 Merge pull request #220 from brockn/master new 1028fb9 make pig, hadoop and log4j jars provided new e54735a Merge branch 'master' into cleanup_dependencies new 61af0b9 Merge pull request #221 from Parquet/cleanup_dependencies new 014f583 [maven-release-plugin] prepare release parquet-1.2.6 new 9424018 [maven-release-plugin] prepare for next development iteration new de81ee8 changelog for 1.2.5 and 1.2.6 new 6b5d2d1 fix bug: set raw data size to 0 after reset new 8e1110b Merge pull request #222 from Parquet/fix_dic_fallback_page_cutting new f4ad9df refactor encoded values changes and test that resetDictionary works new e0c5ac8 Merge pull request #223 from Parquet/dictionary_reset new ab7959d [maven-release-plugin] prepare release parquet-1.2.7 new f587471 [maven-release-plugin] prepare for next development iteration new 493bb9f Changing read and write methods in ParquetInputSplit so that they can deal with large schemas (avoiding use of writeUTF and readUTF which are limited to 65536 characters). new d2ccc72 Breaks parquet-hive up into several submodules, creating infrastructure to handle various versions of Hive going forward. new f18bc49 enable globing files for parquetTupleScheme, refactor unit tests and remove binary test fixture new d7c8467 Merge pull request #224 from dave2718/master new d7994dc add changelog tool new 60c6512 Updates Hive 0.12 compatability patch by adressing all comments from Julien's review plus a few additional cleanups, specifically: new 4d13df5 encapuslate getFooter into a separate method new c31a6be Merge pull request #228 from Parquet/glob_files_for_parquet_tuple_scheme new c2499da [maven-release-plugin] prepare release parquet-1.2.8 new e1d335b [maven-release-plugin] prepare for next development iteration new 842500e [maven-release-plugin] prepare release parquet-1.2.8 new 6cb038c [maven-release-plugin] prepare for next development iteration new 3b4ae5e Merge branch 'master' of https://github.com/Parquet/parquet-mr new b297c73 optimize chunk scan; fix compressed size new 476b8ea Merge branch 'master' into plumb_original_type new e1ce063 check if pig is loaded when writing pig metadata new 7dfd436 format new 7fa1b6a make cascading a provided dependency new 3b829a2 refactor get codec logic to remove duplication in DeprecatedParquetOutputFormat new 70f29c7 add cascading dependency to scrooge, and add cascading.version propertie in project pom new 6fa653b Merge pull request #236 from Parquet/make_cascading_a_provided_dependency new 8b0d05c Merge pull request #229 from Parquet/changelog_tool new 407a52d fix missing codec new 7641feb Merge pull request #227 from brockn/master new 716a030 remove lzo test and lzo dependency new 0b61cd9 Merge branch 'not_write_pig_meta_data_only_when_pig_is_not_avaliable' into handle_codec_not_found new 5a04096 license header new 491481e Merge pull request #235 from Parquet/not_write_pig_meta_data_only_when_pig_is_not_avaliable new f7b2cd7 make CodecConfig a factory new 2e3a370 restore getCompression methods in ParquetOutputFormat for compatibility new 0810736 fix pom version caused by bad merge new 3db0d58 Merge pull request #238 from Parquet/fix_version new 8958626 Merge branch 'master' into handle_codec_not_found new e879680 Merge pull request #237 from Parquet/handle_codec_not_found new 92a47b2 Fix hive map and array inspectors with null containers new a39ad4c fix loader cache new ca01d15 make the cache use a SoftReference new 090d542 Update CHANGES.md new aca1d8b Merge pull request #234 from Parquet/optimize_chunk_scan new c95cb21 Merge pull request #239 from Parquet/hive_fix_null_maps new 760367b Update reference to 0.10 in Hive012Binding javadoc and remove some trailing whitespace I noticed when while updating the javadoc. new 54308f7 Merge pull request #241 from brockn/master new c73754b use latest stable release of cascading: 2.5.1 new 3f75f0e Merge pull request #233 from fs111/master new bb9d898 Merge pull request #240 from Parquet/fix_loader_cache new 22282a9 upgrade elephant-bird version to 4.3 new 600e7c9 Merge pull request #242 from Parquet/upgrade_eb_to_4_3 new 7436d8f add source to parquet-hive-binding new eb4966f [maven-release-plugin] prepare release parquet-1.2.9 new a6f140c [maven-release-plugin] prepare for next development iteration new e5ed117 Update CHANGES.md new 8e23c24 add parquet cascading integration documentation new 9df136c fix typo new 99798b5 fix grammar new 955cd7e plural for records new 847df8f fix changelog new 59601d7 Update CHANGES.md new a347481 improve changelog new 17146c3 Merge branch 'master' into plumb_original_type new e2d819c Loading correct pbClass to ProtoSchemaConverter new 08a204d Depricated init override removed new 83f0646 pom.xml version 1.2.10-SNAPSHOT new 0517253 TestUtils refactoring new c590038 Obsolete test removed new 5bb9e8d integrate parquet format 2.0 new 314ac2b Merge pull request #245 from Parquet/integrate_parquet_format_2 new 652b0fe Merge branch 'master' into plumb_original_type new e36b2f0 implement error handler new 8269a6f handle extra field in data new 3d4513f add checkEnum new 564f370 add tests, fix bug new da4b7fd refactor new bdf5d6b Merge pull request #187 from davidzchen/plumb_original_type new f5eb89d Merge pull request #244 from Parquet/feature/error_handler new e29c2df fix when field index is greater than zero new e94b392 format new cd00dc8 Merge pull request #247 from Parquet/fix/detect_extra_field_when_index_is_not_start_from_zero new aad047a [maven-release-plugin] prepare release parquet-1.2.10 new 5f13c8c [maven-release-plugin] prepare for next development iteration new 0743b60 Update CHANGES.md new 0a01dae Use ContextUtil in tests to avoid dependency on parts of new MR API that are incompatible between MR1 and MR2. new 0df24f0 Rename ParquetInputFormat#addInputPathRecursively to avoid clash with non-static Hadoop 2 method of same name on FileInputFormat. new ea9fd20 Fix syntax error in test that Pig 0.12 complains about. new e83778a make summary files read in parallel; improve memory footprint of metadata new f21fb31 Merge pull request #248 from tomwhite/hadoop-2-compatibility-fixes new 884a5e5 Merge pull request #243 from Parquet/parquet_cascading_doc new a34507d pretty_print_json_for_compatibility_checker new 18012a0 Merge pull request #250 from Parquet/pretty_print_json_for_compatibility_checker new da066e7 [maven-release-plugin] prepare release parquet-1.2.11 new 313c300 [maven-release-plugin] prepare for next development iteration new 392a801 refactor new 0888bde adress comments new 67a7a9d Add writer version flag to parquet and make initial changes for supported parquet 2.0 encodings new d185966 pom version fix new cf9a367 Merge pull request #252 from Parquet/refactor_error_handler new f2e7baa Resolves issue #251 by doing additional checks if Hive returns "Unknown" as a version. new 956ad07 Merge pull request #256 from brockn/master new 4a18684 changes for code review comments - enum as params, shortname for writerversion new f61331e In HIVE-5783 we will need a bundle jar to depend on that does not include the Hive Serde since Hive trunk will contain the Hive Serde. new a68c8fc Merge pull request #254 from Parquet/parquet_2.0_writer new c817785 delta int bin pack new d617084 formatting and license header new 290385c format new 74269e4 Merge pull request #253 from Parquet/delta_int new 978e396 ProtoSchemaConverterUnitTest new ffcc0b8 Merge pull request #257 from brockn/master new 2737282 optimize consecutive row groups scans new a4aef0d Initial commit new 861016b Removing hadoop-core dependency conflict new dba65be tests for Input and Output Formats new 16b2f73 ProtoSchemaConverter Code Style new 1394236 CodeStyle new c1b6161 add delta length byte arrays and delta byte arrays encodings new 5051acc fix minor typo in Encoding reader new 017d088 minor javadoc changes new 82b889c Merge pull request #1 from Parquet/master new 1f75813 junit test for enum schema conversion new 51ca71a remove old package info new 52ffcfe remove commented code new f2e607e add unit test new 7def49c Adds parquet-jackson module to jackson-dependent modules new 3013b9f Merge pull request #249 from Parquet/metadata_opt new 124f2ed Merge branch 'master' into optimize_scan new 3c91e46 refactor dictionary page handling new dc7addc update with correct junit imports new 30adb12 Adds small comment new 9af4125 turn on parquet 2.0 flags new e91cda9 Merge pull request #259 from Parquet/delta_strings new 2b80e47 Merge branch 'master' into add-parquet-jackson-module new cc8375c Merge pull request #258 from Parquet/optimize_scan new 1e69167 Renames jackson.shade.prefix property into shade.prefix new 8926033 Replaces org.codehaus.jackson groupId with corresponding maven property new 1ef3e9f Adds README with some explanations new ee6d882 Renames jackson.shade.prefix property to shade.prefix (part2) new 87864cb [maven-release-plugin] prepare release parquet-1.3.0 new a609147 [maven-release-plugin] prepare for next development iteration new f7a9023 correct byte[] storage new 5997bf5 #projection test new 96f2300 #projection test - fix - cannot use inner class as mapper new 985002e Code cleanup new b273684 ConverterTest new 99b7e52 new root directory new 94b2ec0 delete .idea directory new a717bbf merge new d708c7d parquet-protobuf added to root pom.xml new 919db0b Consistent naming protoXYZ new c8188f3 pom - latest version new 1f4a9db Code cleanup new c7c39c3 Repeated Messages test new 47cd572 Method ProtoParquetInputFormat.setRequestedProjection signature new 565638f refactor new 1d1dd2f 1. refactor: maket ThriftSchemaConverter pluggable, can use ThriftStructConverter or ScroogeStructConvert to convert class to ThriftType 2. support scrooge read projection pushdown 3. add scroogeReadSupport new ebc87de format new 0fb0173 merge master new b9e272a fix test new 36c3b66 format new 7c0d290 Code cleanup new 63b710d Code cleanup - Enum comparsions new 8ed45d0 Unnecessary unboxing new 31e4b06 Url to main parquet repo new e4e9fc2 Update CHANGES.md new 0261cd6 upgrade parquet-mr to elephant-bird 4.4 new 622a400 handler only handle ignored field, exception during will be thrown as SkippableException new 2e43df5 fixes #265: add semver validation checks to non-bundle builds new 7dac815 Merge pull request #266 from aniket486/upgrade_eb_4.4 new 79cc35d Merge pull request #267 from Parquet/handler_only_handle_ignored_fields new 5f57e46 bump maven-enforcer to 1.3.1 and remove some xml cruft new 9199f3e [maven-release-plugin] prepare release parquet-1.3.1 new a906f0f [maven-release-plugin] prepare for next development iteration new 954f39b new ElephantBird (4.3) + correct dependencies. new b752260 ElephantBird 4.4 + hadoop client dependency new 063edb4 Merge pull request #260 from laurentgo/add-parquet-jackson-module new 55ebcac Bumps parquet-jackson parent version new 283293f Merge pull request #269 from laurentgo/fix-parquet-jackson-parent-version new da96420 Merge branch 'master' of github.com:Parquet/parquet-mr into add_semver_checks new 4dae164 unused method in TestUtils new bc610e5 pom version 1.3.2-SNAPSHOT new 880da33 ignore jackson packaging changes w.r.t semver new 3830a15 add maven central as a repo to work around Travis build issues with semver new c1e86d8 remove snapshots=false from maven central xml new 81ab426 Make package java.parquet.proto.converters (mostly) package protected new 2207cb9 switches on enums new 0991475 Code style - small fixes new f232e77 Make ParquetInputSplit extend FileSplit new 5c6876a Revert "Make ParquetInputSplit extend FileSplit" new af880ec Make ParquetInputSplit extend FileSplit new 6664165 fix MapredParquetInputFormat exception issue caused by ParquetInputSplit extending FileSplit new c1298b7 Force <previousVersion> new 8c8cbde Merge pull request #268 from Parquet/add_semver_checks new 46b1ad0 fix bug: when enum index being written is the last index defined in the Enum, a DecodingSchemaMismatchException is thrown. maintain enum loopup table in EnumType new 40f9b24 name fix new ff62194 Merge pull request #271 from Parquet/fix_bug_enum_last_value_exception new b2184b8 add 1.3.1 new 8fb0b02 Update CHANGES.md new 0dfb067 [maven-release-plugin] prepare release parquet-1.3.2 new f012db0 [maven-release-plugin] prepare for next development iteration new 9bdaff9 Add code of conduct to Readme.md new ac8968e prettify a few lines new 471a693 1.3.2 new 81f33a6 Merge remote branch 'upstream/master' new c00409a exclude ParquetInputSplit from semver check which seems to have an issue with inherited method check new 21faa3d Merge pull request #270 from ledbit/master new a8be812 Readme.md - mark Protobuf support as in dev new a2691a7 Exception message new 6763f71 storage of repeated fields without extra level new bbacdf0 storage of repeated fields without extra level - missing protobuffer new b25de98 style: junit.framework to org.junit new da17462 Matching parquet and pbfields by index new 3c0ab7a List cannot be empty new 942cfe2 Dictionary enum conversion new d00eb4e Merge branch 'master' of github.com:Parquet/parquet-mr into junit_framework_to_org new e4329cd move from junit3 to junit4 new 6edfa7e ProtoWriteSupport unit tests new 8cc4cec New ProtoWriteSupport new 496e3fd Scalar Converters are part of Message converter new 5b1b79c javadoc new 02f7707 ProtoMessageConverter case new 2d9cf95 make setup calls static in tests new b929d19 Merge pull request #280 from aniket486/junit_framework_to_org new c8b7ba8 Merge remote branch 'upstream/master' into protobuf new 5ffaba9 Maven shade plugin removed new b1a6774 version 1.3.3-SNAPSHOT + shade plugin new 024d5ab build fix - deleted package new 8ecb0b2 first use current thread's classloader to load a class, if current thread does not have a classloader, use the class's current classloader to load a class. This will make sure a class not packaged in parquet but on classpath loaded properly. Otherwise, for example, if you set your own ReadSupport class to the Configuration object and expect it to be loaded by ParquetInputFormat, it will fail and throw ClassNotFoundException. new a1b7a31 use utility method from Configuration class to load class to avoid ClassNotFoundException new 83bb4b8 Added ParquetWriter() that takes an instance of Hadoop's Configuration. new f2f8e42 Fix to read a new avro schema... new be43f88 Make setting requested projection and avro schema more independent, so that you only need to set the Avro schema if it is different to the writer's schema. new 01bba92 Support promotion of int, long and float to wider types. new e29d26b Use a default Avro read schema when none specified in Parquet-Avro. new ab54b70 Add tests for reading Parquet files using the default Avro schema. new 0185b49 Minor changes following Julien's review new aadaae5 Revert change making field final that failed compatibility test. new 644bf00 Merge pull request #282 from tomwhite/avro-default-read-schema new 3d7d9ad Merge pull request #292 from esammer/master new 137b1e2 Merge pull request #289 from allanyan/master new 68b5314 better error messages, create ParquetScroogeInputFormat class new 045343d Merge remote-tracking branch 'upstream/master' into protobuf new 38241cc Ports HIVE-5783 to the parquet-hive module so that patches can be ported between the two code bases with ease. Note that the code base in Hive itself should be considered the golden copy and any changes made there and then ported to the parquet-hive module. new 083c513 Convert ParquetHiveSerDe back to SerDe interface to support Hive 0.10 new 1be4d6c bugfix: reorder fields in thrift struct caused writting nulls. fixed it by keeping track of which fields are being written in each level, and only write nulls when current level is finished in MessageColumnIO new 02f50f7 rename var new 94d703c Fill in default values for new fields in the read schema that were not in the write schema. new 0d111b1 remove fieldCount from marker new 5dccd0c format new 6496bcc Merge pull request #298 from Parquet/bugfix_reorder_thrift_fields_causing_writting_nulls new 76bbf4a [CASCADING] Provide the sink implementation in order to write some parquet files with ParquetTupleScheme new cc59a40 Don't deep copy immutable primitive types. new 808de5d Support field renaming for Avro read schemas, by means of field aliases. new de7ae6b Add explicit blank namespaces to account for change in AVRO-1295 in Avro 1.7.5. new 3151b2f Merge pull request #299 from tomwhite/avro-fill-in-default-values new 29fe0e0 Merge pull request #303 from tomwhite/avro-read-schema-aliases new cad7f56 Update poms to use thrift.exectuable property. new c48e8c1 HIVE-6456 - Implement Parquet schema evolution new 555837a Support field renaming for Avro read schemas, by means of field aliases. new 7593e65 Add explicit blank namespaces to account for change in AVRO-1295 in Avro 1.7.5. new 8102836 Merge the parquet-tools project into parquet-mr. new 8cc8bdc Merge the parquet-tools project into parquet-mr. new 588f868 Merge branch 'merge_parquet_tools' of github.com:wesleypeck/parquet-mr into merge_parquet_tools new 712e6d7 fix compile error in previous commit new 7b0778c Merge pull request #297 from brockn/master new ed08077 Don't fail if no default value specified for a new value in the read schema. new e237fc4 Don't shade Jackson since Avro exposes Jackson classes in its public API for representing default values for fields. new c7e892c merge master new b07b160 Merge pull request #262 from Parquet/scrooge_schema_converter new 70eada4 NULL tuples cause NPE when writing new 000659a Merge pull request #1 from jalkjaer/cascading_sink new 509e268 Better writing of a loop new 7043a64 Initial int96 implementation. new 77a355a Extending example and group classes for int96. new 34b90d7 Removing Int96 class, using Binary instead. new 56387e3 Remove int96 references from RecordConsumer and Converters. new 6b2eef9 Delegate fixed and int96 types to convertBINARY. new d7c7395 Merge Fixed dictionary with Binary dictionary. new 3fc099f Factoring out common Binary impl in dictionary writer. new 2403257 Use toStringUsingUTF8 to fix tests. new af2380f Add NanoTime to example. new a5d2de1 Add avro constructors with Configuration for #295. new 603c0dc Fix avro schema conv for arrays of optional type for #312. new 5e74bbe Add Configuration constructor in thrift writer for #295. new 8cc3e29 Merge pull request #313 from rdblue/295-add-conf new 132f75d Merge pull request #293 from rdblue/int96-support new d356578 Merge pull request #264 from lukasnalezenec/protobuf new 6063921 Merge pull request #285 from mickaellcr/cascading_sink new f93c9cf Update cascading doc with Scrooge projection down. new e392359 Merge pull request #316 from rdblue/thrift-prefix new 2d5563b Merge pull request #311 from tomwhite/avro-null-default-values-bug new b722e7b Merge pull request #314 from rdblue/312-fix-avro-array-of-optional new 3cfea0a Merge pull request #310 from wesleypeck/merge_parquet_tools new deb5e5d oauth based authentication; fix grep change new 459b29b Merge pull request #319 from Parquet/fix_changelog new a08d257 Spelling fix new eb77222 Merge pull request #320 from posix4e/master new 9899e5b fix filesystem resolution new 0b5116a Merge pull request #329 from Parquet/fix_file_system_resolution new 1920abc compress schemas in input splits new d0e548f a bit of jar size optimization new 4246d18 close gzip stream in finally new 9fdafc0 Merge pull request #333 from Parquet/compress_schemas_in_split new 737a5d5 issue #290, hive map conversion to parquet schema new 3d4311f remove originalType check for typeEquals of GroupType and add tests for HiveSchemaConverter new ba94119 protobuf dependency version changed from 2.4.1 to 2.5.0 new ee00e61 protobuf dependency version changed from 2.4.1 to 2.5.0 - commit fix new 05dea98 issue #324, move ParquetStringInspector to org.apache.hadoop.hive.serde2.objectinspector.primitive package new 621cf4e Added statistics to Parquet pages and rowGroups new 860e123 remove originalType check for typeEquals of GroupType and add tests for HiveSchemaConverter new 5ba0ff1 Merge branch 'master' of github.com:tongjiechen/parquet-mr new 7b5e2ec Addresses some initial comments. Javadocs, removed StatsHelper new 73d6617 [maven-release-plugin] prepare release parquet-1.4.0 new db13f19 [maven-release-plugin] prepare for next development iteration new 670c940 Merge branch 'master' of github.com:egonina/parquet-mr into stats new 594c47e Added licence to new files new 616f778 Update CHANGES.md new 44f31c5 Update CHANGES.md new 125529b issue #324, move ParquetStringInspector to org.apache.hadoop.hive.serde2.objectinspector.primitive package new e8d9763 Merge branch 'master' of https://github.com/Parquet/parquet-mr into issue324 new 9f43945 Refactored the *Statistics classes to reuse more code. Added Binary compareTo methods new 7345536 Merge branch 'issue324' of github.com:tongjiechen/parquet-mr into issue324 new 07c5472 Merge branch 'issue324' of github.com:tongjiechen/parquet-mr into issue324 new 82ec584 issue #324 remove additional tab new 47ff4ab Merge branch 'issue324' of github.com:tongjiechen/parquet-mr into issue324 new 156b186 remove duplicate code new c54cad5 compress kv pairs in ParquetInputSplits new 5207422 Merge pull request #342 from Parquet/compress_kv_pairs_in_split new 253eb6a select * from parquet hive table containing map columns runs into exception. Issue #341. new e1b4800 set cascading version to 2.5.3 new 6aeaa52 Merge pull request #345 from epishkin/cascading_2.5.3 new f9a8676 stop using strings and b64 for compressed input splits new ce2301e Merge pull request #346 from Parquet/compress_kv_pairs_in_split new 5b8af1f set reading length in ThriftBytesWriteSupport to avoid potential OOM caused by corrupted data new f5edd0a Merge pull request #347 from Parquet/check_read_length_avoid_oom new 3f5de76 Merge pull request #344 from szehon/master new 30810ff fix header bug new 05327c1 Added hashCode() method for Statistics class new 16d38e2 Fix bug #350, fixed length argument out of order. new 27f71a1 [maven-release-plugin] prepare release parquet-1.4.1 new bad0012 [maven-release-plugin] prepare for next development iteration new 93359c0 Added length check for comparing two byte arrays new f98de75 adding comments new 41df190 Merge pull request #349 from Parquet/null_header new b8149e9 ParquetThriftStorer new a13ae41 cleanup new 0943978 headers new 67c1e11 use own test fixtures new 6417bae 1. upgrade scrooge dep to 3.12.1 2. fix bug when an enum field is optional, scroogeSchemaConverter would fail new ddca03c cleanup log messages in tests new de0bfe3 cleanup log messages in tests new 9ef1be6 cleanup log messages in tests new f5c3151 Expose values in SimpleRecord new f8877f1 cleanup log messages for default codec new 110fe21 fix test runtime dep missing from pig new d093f49 reverse codec changes new 3fad816 Fix output bug during parquet-dump command new 79a4ac8 Merge pull request #352 from Parquet/ParquetThriftStorer new 5d06526 generate splits by min max size, and align to HDFS block when possible new 796b7dd do not call schema converter to generate projected schema when the projectionFilterStrubg or projectionSchemaStr is specified new 3321b67 fix enum to be upper case new f4a0900 remove unused code new b55eea0 make ParquetFileWriter throw IOException in invalid state case new eeae127 Merge pull request #367 from Parquet/ioexception new 6b7bc54 Merge pull request #366 from Parquet/avoid_convert_thrift_scrooge_class_when_projection_is_not_specified new 0a96b2c local variable of hdfsBlock new dd8c32a fix missing space new 23958b8 check maxSplit size must be greater or equal to minSplitSize new 83493c5 maxSplitSize should always be positive new 2056bfa separate out getParquetInputSplit method in the SplitInfo class, reduce LOC in the generateSplit method new fca4cc9 move parseMessageType out of the loop new 7845cc7 1. remove unused readSupportClass parameter from generateSplit method; 2. double check split min max to be postive in the getSplits method; 3. explicit import java.util.xx in test new 9814332 add more tests so the hdfsSize is not multiple of rowGroup size new ac816d9 min split size default to 0 new 83e34be add non-negative check in generateSplits method new a85b7fd better message new 4c870b0 Merge pull request #362 from nealsid/master new 8e348e6 create a getStartingPos in ColumnChunkMetaData new 00d631c make SplitInfo contain the hdfsBlock new 9705f49 1. check row groups are sorted; 2. add getStartingPos for BlockMetadata, which returns the startingPos for the first Column new 70707e4 use getStartingPos for BlockMetadata, which returns the startingPos for the first Column new 05c3e27 ensure SimpleRecord#getValues() is unmodifiable new 9f672d6 use mid point of a row group to decide to create a split or not new bba221d format new 72dbbdc Merge pull request #353 from Parquet/bugfix_failed_convert_to_scrooge_struct_when_enum_is_optional new ac2b15e change name to checkBelongingToANewHDFSBlock new 93d11c5 Merge pull request #365 from Parquet/generate_splits_by_min_max_size new 8aeea14 Merge pull request #335 from tongjiechen/master new c0b9622 Merge pull request #359 from mping/patch-1 new c9445a3 [maven-release-plugin] prepare release parquet-1.4.2 new 10a0af6 [maven-release-plugin] prepare for next development iteration new 76d05fa Update CHANGES.md new 7640224 Adding back the Page() and writePage() methods for backward-compatibility The methods now pass an empty Stats object downstream new 3e90b41 Merge branch 'master' of https://github.com/Parquet/parquet-mr into stats new 78491a4 adding 1.4.1 as previous version new f6a2218 configure semver to enforce semantic versioning new 9a38aec fix metadata concurency problem new 6aed528 Merge pull request #381 from Parquet/fix_concurency_problem new 3f25ad9 [maven-release-plugin] prepare release parquet-1.4.3 new 00e794a [maven-release-plugin] prepare for next development iteration new 0e334ca Use parameterized to test with and without dictionary. new 5d1a66a protobuf 2.5 instalation script for Travis new 636457c protobuf 2.5 instalation script for Travis - build fix new af74b79 protobuf 2.5 instalation script for Travis - pushd/popd new 5106593 protobuf 2.5 instalation script for Travis - fix new cb3e514 protobuf 2.5 instalation script for Travis - remove make check new 346b387 Merge pull request #337 from tongjiechen/issue324 new 57b0131 Merge pull request #336 from lukasnalezenec/protobuf new 50701e7 Merge branch 'master' into tweak_semver new 163bf6b Add support for DECIMAL type annotation. new a1d7260 Fix primitive type equality for fixed with different lengths. new 3af02db Add more tests for type builders. new 63ffdce Add test for decimal with unsupported primitive types. new 299e0ca Add Types builder API documentation. new 73d7558 Simplify Types API by moving repetition. new 5c80705 Update documentation and formatting. new 9ef22e6 Fix maximum precision calculation, account for sign bit. new 86501c2 Add INT32 and INT64 as supported types for DECIMAL. new acaac8b Implement code review changes. new c825e89 Remove unchecked casts from Types.Builder. new 0189ff1 Fix more code review finds. new db31a49 upgrade semver and add exclude for shaded stuff new 638c044 update version new 9f75dd1 Merge pull request #355 from rdblue/decimal new 0c740e0 remove exclude for Split new bcd2ec5 remove unnecessary version number in parquet-scrooge new d08313c add release 1.4.3 to changelog new 7d335d8 Merge pull request #378 from Parquet/tweak_semver new 96b94e1 Merge pull request #351 from rdblue/350-fix-int96-dictionary new ff830a9 previous version to 1.4.2 new 2678e39 Merge branch 'master' of https://github.com/Parquet/parquet-mr into stats new c98d8af adding back the parquet-hadoop methods that don't have statistics parameters, for backward comp new 041146e Fixed hadoop WriteSupportClass loading new 882740f return NullCounter when read via Cascading, but not within a cluster side job new 05b4e7c Merge pull request #338 from egonina/stats new cc28822 Added padding for columns not found in file schema new 70bb0ea fixes for converting from bytes, toString() methods, writing stats to Footer, unit testing for MAX/MIN_VALUE new 4d42afb Merge pull request #392 from egonina/stats new 10dc714 Added test for null padding new d5a8f9f Merge pull request #389 from dcw-netflix/pad-schema new 24076a4 Fixed issue with column pruning when using requested schema new fb7dba1 Updated test and remove shortcut return statement in loader new b70509d Merge pull request #397 from dcw-netflix/requested-schema-pruning new 8091a1b fix null stats new 7e4346b merging with fix_null_stats branch new e4991ff Merge branch 'master' of https://github.com/Parquet/parquet-mr into stats new 4fee0a7 Bug fix - resetting stats after writing page. Fixed unit test to test reading footer new 54f9b10 Cleaning up + testing small & large values new fd8d18f Merge pull request #399 from egonina/stats new 7997745 [maven-release-plugin] prepare release parquet-1.5.0 new b2f0fae [maven-release-plugin] prepare for next development iteration new 01d5157 Update CHANGES.md new a05afe2 Merge pull request #387 from ambiata/fix-writeclass new ee0b98c Merge pull request #388 from fs111/master new b767ac4 Update README.md new 859b6b4 PARQUET-3: tool to merge pull requests based on Spark new 9ad5485 PARQUET-2: Adding Type Persuasion for Primitive Types new 4ad7303 Minor fix new 9c2fab4 PARQUET-6: Create documentation on how to contribute. new 2d8ebdb PARQUET-9: Filtering records across multiple blocks new 5dffe35 PARQUET-4: Use LRU caching for footers in ParquetInputFormat. new f6c02e2 PARQUET-21: Fix reference to 'github-apache' in dev docs new fb01048 PARQUET-18: Fix all-null value pages with dict encoding. new f284238 PARQUET-22: Backport of HIVE-6938 adding rename support for parquet new 4a07b3f PARQUET-25. Pushdown predicates only work with hardcoded arguments. new 17864df Column index access support new fc2c29d PARQUET-19: Fix NPE when an empty file is included in a Hive query that uses CombineHiveInputFormat new ad32bf0 Add a unified and optionally more constrained API for expressing filters on columns new b0e26ee Only call put() when needed in SchemaCompatibilityValidator#validateColumn() new 21d871b PARQUET-56: Added an accessor for the Long column type. new 0793e49 PARQUET-57 - Update dev README to clarify two points new 0148455 PARQUET-13: The `-d` option for `parquet-schema` shouldn't have optional argument new 3a396d3 PARQUET-59: Fix parquet-scrooge test on hadoop-2. new b86b01b [maven-release-plugin] prepare release parquet-1.6.0rc1 new 08a3c6a [maven-release-plugin] prepare for next development iteration new 0d497c4 PARQUET-73: Add support for FilterPredicates to cascading schemes new 7af955a PARQUET-50: Re-Enable the semver enforcer new 7b415fa Parquet-70: Fixed storing pig schema to udfcontext for non projection case and moved... new 45e5810 PARQUET-69: Committer doc new 54bb983 PARQUET-62: Fix binary dictionary write bug. new 792b149 PARQUET-67: mechanism to add extra metadata in the footer new 84ebe4c PARQUET-66: Upcast blockSize to long to prevent integer overflow. new 8474f6d PARQUET-80: upgrade semver plugin version to 0.9.27 new d3cd97a PARQUET-75: Fixed string decode performance issue new 7a10506 PARQUET-8: bump scrooge-maven-plugin version new f8b06df do ProtocolEvents fixing only when there is required fields missing in the requested schema new 647b8a7 PARQUET-63: Enable dictionary encoding for FIXED. new 5dafd12 PARQUET-84: Avoid reading rowgroup metadata in memory on the client side. new 5f39948 update scala 2.10 new 24119cc upgrade scalatest_version to depend on scala 2.10.4 new f637c44 PARQUET-87: Add API for projection pushdown on the cascading scheme level new fbe458f PARQUET-88: fix pre-version enforcement new 8d878af PARQUET-24: enforce JIRA prefix new 316b568 [maven-release-plugin] prepare release parquet-1.6.0rc2 new 501e8fe [maven-release-plugin] prepare for next development iteration new 9cdcf3b PARQUET-94: Fix bug in ParquetScroogeScheme constructor, minor cleanup new 3dc223c PARQUET-92: Pig parallel control new 0eb9637 PARQUET-89: Add hadoop-2 test profile for Travis CI. new 59c58d0 PARQUET-82: Check page size is valid when writing. new 0c4f13a PARQUET-101: fix meta data lookup when not using task.side.metadata new 3a082e8 PARQUET-90: integrate field ids in schema new bf20abb PARQUET-96: fill out some missing methods on parquet.example classes new 0b17cbe PARQUET-104: Fix writing empty row group at the end of the file new da91299 PARQUET-64: Add new OriginalTypes in parquet-format 2.2.0. new be1222e PARQUET-107: Add option to disable summary metadata. new 31fb4df PARQUET-105: use mvn shade plugin to create uber jar, support meta on a folder new ccfca8f PARQUET-106: Relax InputSplit Protections new a29815a PARQUET-123: Enable dictionary support in AvroIndexedRecordConverter new f1da5e9 PARQUET-121: Allow Parquet to build with Java 8 new 92e6d71 PARQUET-122: make task side metadata true by default new 251a495 PARQUET-135: Input location is not getting set for the getStatistics in ParquetLoader when using two different loaders within a Pig script. new d105819 PARQUET-132: Add type parameter to AvroParquetInputFormat. new 3aa6f11 PARQUET-114: Sample NanoTime class serializes and deserializes Timestamp incorrectly new ad06e61 PARQUET-52: refactor fallback mechanism new b5f6a3b PARQUET-140: Allow clients to control the GenericData instance used to read Avro records new ccc29e4 PARQUET-117: implement the new page format for Parquet 2.0 new b7a82a9 PARQUET-145 InternalParquetRecordReader.close() should not throw an exception if initialization has failed new 8e2ea92 PARQUET-150 Update merge script issue id matching. new 23db4eb PARQUET-108: Parquet Memory Management in Java new 52f3240 PARQUET-141: upgrade to scrooge 3.17.0, remove reflection based field info inspection... new d70fdbc PARQUET-168: Fixes parquet-tools command line option description new 4bf9be3 PARQUET-136: NPE thrown in StatisticsFilter when all values in a string/binary column trunk are null new 0751f97 PARQUET-174: Replaces AssertionError constructor introduced in Java7 new d7dd228 PARQUET-133: Upgrade snappy-java to 1.1.1.6 new e505e1f PARQUET-124: normalize path checking to prevent mismatch between URI and ... new b4380f2 PARQUET-142: add path filter in ParquetReader new 32a9c6d PARQUET-157: Divide by zero fix new a635f21 Update Travis CI link in README.md. new 3df3372 PARQUET-111: Updates for apache release new 8041735 PARQUET-173: Fixes `StatisticsFilter` for `And` filter predicate new 668d031 PARQUET-181: Scrooge Write Support (take two) new 05adc21 PARQUET-177: Added lower bound to memory manager resize new ce65dfb PARQUET-139: Avoid reading footers when using task-side metadata new 807915b PARQUET-116: Pass a filter object to user defined predicate in filter2 api new f48bca0 PARQUET-164: Add warning when scaling row group sizes. new 4f87e0f PARQUET-190: fix an inconsistent Javadoc comment of ReadSupport.prepareForRead new f1b5487 PARQUET-191: Fix map Type to Avro Schema conversion. new c82f703 PARQUET-192: Fix map null encoding new 36a02dc PARQUET-188: Change column ordering to match the field order. new fa8957d PARQUET-187: Replace JavaConversions.asJavaList with JavaConversions.seqAsJavaList new d084ad2 PARQUET-160: avoid wasting 64K per empty buffer. new ea81e9a PARQUET-186: Fix Precondition performance problem in SnappyUtil. new 998d650 PARQUET-134 patch - Support file write mode new 2583494 PARQUET-162: ParquetThrift should throw when unrecognized columns are passed to the column projection API new 5851e6d PARQUET-197 : fix parquet-cascading not writing parquet metadata file new 2d1eaef PARQUET-202 Typo in the connection info in the pom prevents publishing an RC new b2623f1 [maven-release-plugin] prepare release parquet-1.6.0rc5 new a7155a8 [maven-release-plugin] prepare for next development iteration new 12ee6b4 PARQUET-208: Revert PARQUET-197 new 3fc2854 PARQUET-193: Implement nested types compatibility rules in Avro new ba43142 [maven-release-plugin] prepare release parquet-1.6.0rc6 new cd89c88 [maven-release-plugin] prepare for next development iteration new a0c77b6 PARQUET-111: Update headers in parquet-tools, remove NOTICE. new 5acc6a5 PARQUET-97: make ProtoParquetReader#builder static new 031a762 PARQUET-172: Add parquet-thrift binary tests. new b58789c PARQUET-180: Update use of TBinaryProtocol#setReadLength. new 77826fd PARQUET-215 Discard records with unrecognized union members in the thrift write path new 9ee3a16 PARQUET-217 Use simpler heuristic in MemoryManager new 2e3c053 PARQUET-197 : Gen parquet metadata from cascading new ec6f200 [maven-release-plugin] prepare release parquet-1.6.0rc7 new cb7f6a8 [maven-release-plugin] prepare for next development iteration new fd3085e PARQUET-204: add parquet-schema directory support new b8f5d89 PARQUET-189: Support building parquet with thrift 0.9.0 new 4fea3ea PARQUET-165: Add a new parquet-benchmark module new 9a92f39 PARQUET-165: Update parquet version in the benchmark module new 0ab0013 PARQUET-210: add JSON support for parquet-cat new 4ed0bdf PARQUET-214: Fix Avro string regression. new 27ba681 PARQUET-230: Add build instructions to README. new bfb3145 PARQUET-220: Remove unnecessary warnings initializing ParquetRecordReader new ff7a486 Revert "PARQUET-220: Remove unnecessary warnings initializing ParquetRecordReader" new 4950ad8 PARQUET-242: Fix AvroReadSupport.setAvroDataSupplier. new f272a6e PARQUET-234: Add ParquetInputSplit methods for compatibility. new 920192a PARQUET-235: Fix parquet.metadata compatibility. new b613629 PARQUET-239: Make AvroParquetReader#builder static. new 828ff75 PARQUET-211: 1.6.0 release changes new 4f66077 PARQUET-211: Set version to 1.6.0 for release. new e101917 PARQUET-211: Set version for 1.7.0-incubating development. new f28aa71 PARQUET-252 : Support nests container types for scrooge support new 720b988 Revert "PARQUET-252 : Support nests container types for scrooge support" new b10870e PARQUET-23: Rename to org.apache.parquet. new 7c42398 PARQUET-211: Update version for 1.8.0 development. new 4f7c704 PARQUET-245: Only run tests in Travis CI if build succeeds. new 9d744f7 PARQUET-268: Downgrade scrooge-maven-plugin. new 1be3878 PARQUET-270: Adds a legend for meta output to readme.md new b287d35 PARQUET-271: Fixes parquet-tools java examples new 9993450 PARQUET-227 Enforce that unions have only 1 set value, tolerate bad records in read path new 98f54c1 PARQUET-175 reading custom protobuf class new 22c6d08 PARQUET-269: Restore scrooge-maven-plugin to version 3.17.0 new 7fc7998 PARQUET-229 Add a strict thrift projection API with backwards compat support new 890b387 PARQUET-252 : support nested container type for parquet-scrooge new b8aae90 PARQUET-272: Updates docs description to match data model new 9500c77 PARQUET-276: Updates CONTRIBUTING file with new repo info new c7d56cf PARQUET-273 : remove usage of ReflectiveOperationException to support JAVA6 new e5d9c6c PARQUET-265: Update POM files for Parquet TLP. new 7680fae PARQUET-254: Fixes exception message new 136c5ff PARQUET-253: Fixes Javadoc of AvroSchemaConverter new 1dbcdf2 PARQUET-274: Updates URLs to link against the apache user instead of Parquet on github new 60edcf9 PARQUET-278 : enforce non empty group on MessageType level new a458e1a PARQUET-243: Add Avro reflect support new 181affd PARQUET-164: Add a counter and increment when parquet memory manager kicks in new ded56ff PARQUET-287: Keep a least 1 column from union members when projecting thrift unions new 8769d0f PARQUET-262: Restore semver checks. new dd92a9d PARQUET-223: Add builders for MAP and LIST types new 213e952 [maven-release-plugin] prepare release parquet-1.8.0rc1 new 33a2202 [maven-release-plugin] prepare for next development iteration new 4b5cda5 PARQUET-151: Skip writing _metadata file in case of no footers since schema cannot be determined. new d6f082b PARQUET-285: Implement 3-level lists in Avro new 918609f PARQUET-286: Update String support to match upstream Avro. new 2e62764 PARQUET-266: Add support for lists of primitives to Pig schema converter new 4590f14 PARQUET-246: fix incomplete state reset in DeltaByteArrayWriter.reset() new faf5421 PARQUET-263: Release changes from parquet-1.7.0 branch new 5f48f19 PARQUET-309: remove unnecessary compile dependency on parquet-generator new 1c16068 PARQUET-264: Remove remaining references to parquet being an incubator project new ad44321 PARQUET-297: generate Version class using parquet-generator new 079bcd0 PARQUET-297: Tests for PR 213 (Version generator) new 29283b7 PARQUET-314: Fix broken equals implementations new 89321a2 PARQUET-311: Fix NPE when debug logging metadata new 412ab96 PARQUET-306: Add row group alignment new 46448e9 PARQUET-201: Fix ValidTypeMap being overly strict with respect to OriginalTypes new 5c2ba72 PARQUET-284: Clean up ParquetMetadataConverter new cb04562 PARQUET-248: Add ParquetWriter.Builder. new 1f3e72f PARQUET-317: Fix writeMetadataFile crash when a relative root path is used new e6ee42e PARQUET-316: Fix the benchmark module new e3b9502 PARQUET-251: Binary column statistics error when reuse byte[] among rows new 9fde653 PARQUET-320: Fix semver problems for parquet-hadoop. new c7720ca PARQUET-325: Always use row group size when padding is 0. new a747456 PARQUET-308: Add ParquetWriter#getDataSize accessor. new 2f2c8b1 PARQUET-289: Allow ParquetReader.Builder subclasses. new c334a1b PARQUET-290: Add data model to Avro reader builder new 013b445 PARQUET-152: Add validation on Encoding.DELTA_BYTE_ARRAY to allow FIX… new f4e754e PARQUET-324: row count incorrect if data file has more than 2^31 rows new 043fcde PARQUET-246: File recovery and work-arounds new 4c7d752 PARQUET-329: Restore ThriftReadSupport#THRIFT_COLUMN_FILTER_KEY new 8f898da PARQUET-292: Update CHANGES.md for 1.8.0. new 0fda28a [maven-release-plugin] prepare release apache-parquet-1.8.0 new abfe355 [maven-release-plugin] prepare for next development iteration new fcd5682 PARQUET-279 : Check empty struct in compatibility checker new be9f3cb PARQUET-331: Surface subprocess stderr in merge script new 8a2c618 PARQUET-338: Fix pull request example in README new f79c936 PARQUET-337 handle binary fields in set/map/list in parquet-scrooge new 8714dd0 PARQUET-336: Fix ArrayIndexOutOfBounds in checkDeltaByteArrayProblem new 8da9456 PARQUET-339: Add Alex Levenson to KEYS file new 07cefb8 Update CHANGES for 1.8.1 release new 4aba4da [maven-release-plugin] prepare release apache-parquet-1.8.1 new 1dd5cec [maven-release-plugin] prepare for next development iteration new 83406b7 PARQUET-340: MemoryManager: max memory can be truncated new 454fc36 PARQUET-342: Updates to be Java 6 compatible new b86f68e PARQUET-346: Minor fixes for PARQUET-350, PARQUET-348, PARQUET-346, PARQUET-345 new 2f956f4 PARQUET-341 improve write performance for wide schema sparse data new 3f36b7b PARQUET-362 - Fix parquet buffered writer being oversensitive to union schema changes new 01fbf81 PARQUET-343 Caching nulls on group node to improve write performance on wide schema sparse data new 2c90a9d PARQUET-356: Update LICENSE files for code from ElephantBird. new 04f524d PARQUET-361: Add semver prerelease logic. new 9962a0f PARQUET-335: Remove Avro check for MAP_KEY_VALUE. new f203d80 PARQUET-363: Allow empty schema groups. new d24ecb3 PARQUET-376: Tolerate square brackets in PR titles new 415761d Revert "PARQUET-376: Tolerate square brackets in PR titles" new 66e39fc PARQUET-375: Update current release version in README.md new 0637e2f PARQUET-360: Handle all map key types with cat tool's json dump new c381968 PARQUET-355: Add Statistics Test for Parquet Columns new b1ea059 PARQUET-381: Add feature to merge metadata (summary) files, and control which files are generated new 5294c64 PARQUET-373: Fix flaky MemoryManager tests. new 5a45ae3 PARQUET-241: Fix ParquetInputFormat.getFooters() order new 6b605a4 PARQUET-77: ByteBuffer use in read and write paths new 440882c PARQUET-364: Fix compatibility for Avro lists of lists. new 0912987 PARQUET-380: Fix build when using thrift 0.9.0. new efafa61 PARQUET-378: Add thoroughly parquet test encodings new 6308304 PARQUET-396: Extend ParquetReader.Builder<T> new f4918bb PARQUET-398: Updates dev/COMMITTERS.md new e32aa6f PARQUET-398: Add 'spena' information to dev/COMMITTERS.md new 14097c6 PARQUET-387: Improve NPE message when avro arrays contain null. new f2615d9 PARQUET-349: VersionParser does not handle versions missing 'build' section new dcd1c33 PARQUET-352: Add object model property to file footers. new a24d624 PARQUET-305: Update logging to SLF4J. new 5632640 PARQUET-99: Add page size check properties new b45c4bd PARQUET-382: Add methods to append encoded data to files. new 4916903 PARQUET-353: Release compression resources. new fa7588c PARQUET-334: UT test failure with Pig 0.15 new 367fe13 PARQUET-318: Remove unnecessary object mapper new fbb2c9e PARQUET-404: Replace g...@github.com.apache for HTTPS URL on dev/README.md to avoid permission issues new 368588b PARQUET-413: Fix Java 8 test failure. new 37f72dc PARQUET-212: Implement LIST read compatibility rules in Thrift new 84b2b74 PARQUET-421: Fix mismatch of javadoc names and method parameters in m... new 30ee10d PARQUET-422: Fix a potential bug in MessageTypeParser where we ignore… new c38386d PARQUET-393: Update to parquet-format 2.3.1. new af9fd05 PARQUET-432: Complete a todo for method ColumnDescriptor.compareTo() new 5769479 PARQUET-480: Update for Cascading 3.0 new 63d5ae7 PARQUET-495: Fix mismatches in Types class comments new 06a4689 PARQUET-410: Fix hanging subprocess call in merge script. new 0a711eb PARQUET-415: Fix ByteBuffer Binary serialization. new a4acf53 PARQUET-509: Fix args passed to string format calls new c26fa78 PARQUET-385 PARQUET-379: Fixes strict schema merging new 6c9ca4d PARQUET-430: Change to use Locale parameterized version of String.toUpperCase()/toLowerCase new 944291b PARQUET-431: Make ParquetOutputFormat.memoryManager volatile new c44f982 PARQUET-529: Avoid evoking job.toString() in ParquetLoader new fb46b94 PARQUET-397: Implement Pig predicate pushdown new 1f91c79 PARQUET-528: Fix flush() for RecordConsumer and implementations new 4b1ff8f PARQUET-384: Add dictionary filtering. new e9928c9 PARQUET-571: Fix potential leak in ParquetFileReader.close() new d402148 PARQUET-581: Fix two instances of the conflation of the min and max row new ac62c1c PARQUET-580: Switch int[] initialization in IntList to be lazy new dc08bb8 PARQUET-584 show proper command usage when there's no arguments new 82b8ecc PARQUET-484: Warn when Decimal is stored as INT64 while could be stored as INT32 new 6b24a1d PARQUET-358: Add support for Avro's logical types API. new 36ce032 PARQUET-585: Slowly ramp up sizes of int[]s in IntList to keep sizes small when data sets are small new 7419443 PARQUET-327. Show statistics in the dump output. new 8bcfe6c PARQUET-225: Add support for INT64 delta encoding. new 3dd2210 PARQUET-548: Add EncodingStats. new 2f22533 PARQUET-569: Separate metadata filtering for ranges and offsets. new 39a3cd0 PARQUET-560: Synchronize writes to the finishCalled variable new c3f3830 PARQUET-372: Do not write stats larger than 4k. new da69d4b PARQUET-367: "parquet-cat -j" doesn't show all records. new 1f47025 PARQUET-544: Add closed flag to allow for closeable contract adherence new 9c40a7b PARQUET-645: Fix null handling in DictionaryFilter. new 7f8e952 PARQUET-642: Improve performance of ByteBuffer based read / write paths new bd0b5af PARQUET-612: Add compression codec to FileEncodingsIT. new e036d60 PARQUET-654: Add option to disable record-level filtering. new 02ce9b0 PARQUET-663: Update README.md new 42662f8 PARQUET-389: Support predicate push down on missing columns. new a421d95 PARQUET-540: Fix Cascading 3 build thrift and SLF4J. new 626014e PARQUET-651: Improve Avro's isElementType check. new 6a62646 PARQUET-543: Remove unused boundedint package. new 60b6d5a PARQUET-667: Update committers lists to point to apache website new 5c85b8d PARQUET-511: Integer overflow when counting values in column. new ea402be PARQUET-668 - Provide option to disable auto crop feature in dump new 76a2ac8 PARQUET-669: allow reading footers from provided file listing and streams new b301d12 PARQUET-667: Add back + update committers table new 30aa910 PARQUET-601: Add support to configure the encoding used by ValueWriters new c8d78b2 PARQUET-146: Move Parquet to Java 7 new 898f3d0 PARQUET-400: Replace CompatibilityUtil with SeekableInputStream. new 255f108 PARQUET-460: merge multi parquet files to one file new 6dad1e3 PARQUET-696: fix travis build. Broken because google code shut down new 044de16 PARQUET-623: Fix DeltaByteArrayReader#skip. new e54ca61 PARQUET-660: Ignore extension fields in protobuf messages. new b59be86 PARQUET-674: Add InputFile abstraction for openable files. new 07a42d3 PARQUET-726: Increase max difference of testMemoryManagerUpperLimit to 10% new e6da0f6 PARQUET-685 - Deprecated ParquetInputSplit constructor passes paramet… new a0e6cc3 PARQUET-727: Ensure correct version of thrift is used new 06768d9 PARQUET-740: Introduce editorconfig new de99127 PARQUET-686: Do not return min/max for the wrong order. new 59ec4f0 PARQUET-743: Fix DictionaryFilter when compressed dictionaries are reused. new 31d0d4d PARQUET-392: Update CHANGES.md for 1.9.0. new 2a99abf [maven-release-plugin] prepare release apache-parquet-1.9.0 new 27b9934 [maven-release-plugin] prepare for next development iteration new 0116aa7 PARQUET-392: Fix svn log message in source-release.sh. new 1058b7d PARQUET-392: Fix staging instructions in prepare-release.sh. new ece4b70 PARQUET-751: Add setRequestedSchema to ParquetFileReader. new 38262e2 [maven-release-plugin] prepare release apache-parquet-1.9.0 new aa416b5 [maven-release-plugin] prepare for next development iteration new df9d8e4 PARQUET-423: Replace old Log class with SLF4J Logging new e5cd652 PARQUET-753: Fixed GroupType.union() to handle original type new 0ed977a PARQUET-768: Add Uwe L. Korn to KEYS new cf99160 PARQUET-755: create parquet-arrow module with schema converter new 4453aa3 PARQUET-765 - Upgrade Avro to 1.8.1 new 09d28fe PARQUET-783: Close the underlying stream when an H2SeekableInputStream is closed new 7987a54 PARQUET-786: 'java -jar', not 'java jar' closes #377, #374 new 4fd34e6 PARQUET-220: Unnecessary warning in ParquetRecordReader.initialize new 98c2769 PARQUET-321: Default maximum block padding to 8MB. new 71cff7c PARQUET-791: Add missing column support for UserDefinedPredicate new 89e0607 PARQUET-801: Allow UserDefinedPredicates in DictionaryFilter new f68dbc3 PARQUET-825: Static analyzer findings (NPEs, resource leaks) new 6fb6085 PARQUET-822: Upgrade java dependencies new 3634821 PARQUET-806: Parquet-tools silently suppresses error messages new 2fd62ee PARQUET-772: Fix locale-specific test failures. new 70f2881 PARQUET-665 Adds support for proto3 new a703ee7 PARQUET-969: Update parquet-tools to convert Decimal datatype to BigD… new fd7cfed PARQUET-196: parquet-tools command for row count & size new 1de41ef PARQUET-852: Slowly ramp up sizes of byte[] in ByteBasedBitPackingEncoder new 9491d7a PARQUET-990 More detailed error messages in footer parsing new 9d58b6a Parquet-884: Add support for Decimal datatype to Parquet-Pig record reader new 2d3203b PARQUET-1005: Fix DumpCommand parsing to allow column projection new 352b906 PARQUET-1026: allow unsigned binary stats when min == max new df9f8d8 PARQUET-1024: allow for case insensitive parquet-xxx prefix in PR title new ddbeb4d PARQUET-777: Add Parquet CLI. new d55a572 PARQUET-1133 Add int96 support by returning bytearray, Skip originalType comparison for map types when originalType is null new 328c5de PARQUET-1115: Warn users when misusing parquet-tools merge new 170cfa7 PARQUET-1152: Parquet-thrift doesn't compile with Thrift 0.9.3 new c532b0e PARQUET-1153: Parquet-thrift doesn't compile with Thrift 0.10.0 new ba7b8ba PARQUET-1149: Update Avro to 1.8.2 new 132b2a8 PARQUET-1143: Update to Parquet format 2.4.0. new 81f4801 PARQUET-1156: Address dev/merge_parquet_pr.py problems. new 8bfd9b4 PARQUET-1142: Add alternatives to Hadoop classes in the API new da3e8eb PARQUET-357: Parquet-thrift generates wrong schema for Thrift binary fields new 9191fbd PARQUET-1141: Fix field ID handling new 2adb657 PARQUET-1077: Use long key ids in KEYS file new 3783ca4 PARQUET-1185: TestBinary#testBinary unit test fails after PARQUET-1141 new 4d996d1 PARQUET-386: Printing out the statistics of metadata in parquet-tools new c6764c4 PARQUET-1025: Support new min-max statistics in parquet-mr new b80b184 PARQUET-1197: Log rat failures new 878ebcd PARQUET-1191: Type.hashCode() takes originalType into account but Type.equals() does not new 89aeec0 PARQUET-1170: Logical-type-based toString for proper representeation in tools/logs new 6e0cc72 PARQUET-1065: Deprecate type-defined sort ordering for INT96 type. new 6a4bbe9 PARQUET-1198: Bump java source and target to java8 new 445cb9d PARQUET-1215: Add getFooter to ParquetWriter. new ad80bfe PARQUET-1208: Occasional endless loop in unit test new 8bbc6cb PARQUET-787: Limit read allocation size new b82d962 PARQUET-1217: Incorrect handling of missing values in Statistics new 3d2d4fd PARQUET-1135: upgrade thrift and protobuf dependencies new 0a86429 PARQUET-1246: Ignore float/double statistics in case of NaN new a7ca605 PARQUET-1258: Update scm developer connection to github (#462) new d54fad8 PARQUET-1183: Add Avro builders using InputFile and OutputFile. (#460) new 12bbaf3 PARQUET-1263: If file has a config, use it for ParquetReadOptions. (#464) new 9261c28 PARQUET-1189: Update CHANGES.md for 1.10.0 release. new d61d221 PARQUET-1264: Fix javadoc warnings for Java 8. new 150c578 PARQUET-1264: Fix javadoc 8 problem in VersionGenerator. new 0d55abd RQUET-1264: Fix javadoc warnings for Java 8. new ce4d1c9 PARQUET-1258: Update scm developer connection to github HTTPS. new 031a665 [maven-release-plugin] prepare release apache-parquet-1.10.0 new 625aa51 PARQUET-1512: Set version to 1.10.1-SNAPSHOT. new 4f945b9 PARQUET-1309: Parquet Java uses incorrect stats and dictionary filter properties (#490) new 50b1f47 PARQUET-1510: Fix notEq for optional columns with null values. (#603) new 68125a7 PARQUET-1512: Update CHANGES.md for 1.10.1. new 8ad44a9 [maven-release-plugin] prepare release apache-parquet-1.10.1 The 1888 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference.