This is an automated email from the ASF dual-hosted git repository.
blue pushed a change to branch parquet-1.10.x
in repository https://gitbox.apache.org/repos/asf/parquet-mr.git.
at 8ad44a9 [maven-release-plugin] prepare release apache-parquet-1.10.1
This branch includes the following new commits:
new 576c709 Initial commit
new a8c10ef initial commit
new f3cdad3 README.md
new 2402879 Update README.md
new dbbf944 updated name
new 7ed9528 Merge branch 'master' of github.com:julienledem/redelm
new 5fb63e9 refactoring tuples to use indices instead of field names
new 741c6fd adding push parser test
new a1a04d5 move closing records at the begining of the loop
new 9b68b39 make field started and ended only once when value is
repeated, refactor out SimpleGroupRecordConsumer
new 34124ed remove depedency on Group from RecordReader
new 968fd55 remove dependency on group from the io implementation
new c4109e2 fix compilation and dependency issue
new f689f90 fix compilation issue
new 82e0324 pig Tuple consumer and writer
new 5e95561 temporarily removing thrift stuff
new f2ea25f turning off logs
new d1ceb26 Update README.md
new da7ec48 adding license header
new 0b73962 Merge branch 'master' of github.com:julienledem/redelm
new 7b98856 adding travis ci conf
new e6ecdc6 fixing source version
new cf5ba98 fix source encoding
new ab7df25 added build info
new 2302fe4 first version of the Loader/Storer
new 575c221 PrintFooter tool
new bdd3635 Add some comments and tests
new 40d512b work
new 591da70 adding support for Float and Double
new 0025768 fixed tests
new 246efae refactor of the columns
new 527beef adding license headers
new 11b8190 Merge pull request #1 from julienledem/hack_week
new 1e7152b work
new 9da016d merged
new 71b7b11 work
new 02b40db work
new 38434c6 work
new f84fa27 work
new 30e3849 work
new 194cfdb work
new e12b27f fix support for int/long and map
new d080698 adding schema validation
new da419bf fix bug regarding null values in message
new 9f9b23d work
new 9c34678 work
new 525d3ae work
new 4cd5bc1 work
new 063486d work
new ba06716 remove a use of StringBuffer
new 46a3458 first pass at adding compression
new 4bfca1f make Codec configurable
new 1a0284d fix empty string bug
new 94781b0 VarInt for String length
new 1d40ac2 add javadoc
new c2ed3ee work
new 3b447a6 Merge pull request #8 from julienledem/add_column_compression
new 32bb525 work
new 94c10a2 Merge branch 'hack_week_jco' of
https://github.com/jcoveney/redelm
new 33525b8 add JCO
new fd5bd4a record uncompressed size in footer; add detailed report of
size and compression per column
new 3b8ad08 cleanup logs
new df97795 javadoc
new e2e3eeb first stab a decoupling the Input/OutputFormat from Pig
new 76ded4f store count of metadatablocks in footer
new f9be51e Merge pull request #9 from julienledem/javadoc
new f070939 Merge pull request #10 from
julienledem/decoupling_InputOutputFormat_from_Pig
new 130e213 better hadoop layer decoupling
new 4fb9a48 moved hadoop implementation into its own hadoop package
without dependencies on pig packages
new fe6066b get the compression codec from Configuration properties
new b507f7d make block size configurable
new 5271080 Merge pull request #11 from
julienledem/move_hadoop_stuff_into_hadoop_package
new 1243fcc add status header
new 06ab2ed implement empty bag != null bag
new fbf8333 cleanup unnecessary variables
new 8774451 first stab at pregenerated Pig consumer
new 0094892 use non spillable databag for records
new 27c0255 make Map work
new ffe80dd add summary file
new 88ed7fe Support for short int columns by JCoveney
https://github.com/jcoveney/redelm rep_def_column add pig snapshot
new af992f3 fixed doc based on jco's comments
new 4a372bc Merge branch 'master' into summary_file
new ad3b459 Merge branch 'master' into preprocess_pig_schema
new 40b0cb3 Merge pull request #13 from
julienledem/fix_empty_bag_equals_null_bag
new 0c683e1 fix build warning
new bdc5a6e fix compression javadoc
new 60ca3a3 add documentation images
new daea1fd updated format diagram
new bce0f0d fix Codec Logging
new 6f4650a Merge branch 'master' into preprocess_pig_schema
new 48ef66b fix exception handling
new c73105c fix exceptions in Converters
new 4c0a1f7 Merge pull request #14 from julienledem/preprocess_pig_schema
new 0549961 Merge pull request #16 from julienledem/fix_Codec_Logging
new ea3d65e improve file format diagram
new 9385a2f Merge branch 'master' of github.com:julienledem/redelm
new ee02094 better logging and perf tests
new 9223ad5 Merge branch 'master' into summary_file
new 817b54b Merge pull request #17 from julienledem/summary_file
new 615c23c make splits contain all data blocks starting in the same HDFS
block
new b23e1f6 add missing license headers
new e186b7d fix UDFContext collision when multiple stores
new 73d4fde first stab at record reader compiler
new 0f88e29 simplified record reader; a little more of reader compiler
new 104b219 remove currentNodePath from reader and improve perf a lot
new 7133e58 Merge pull request #19 from
julienledem/remove_currentNodePath_from_reader
new c4a41dd more fsa codegen
new 072db3e change record reader init lifecycle
new 48f0e8f refactor record consumer materializer
new b6f0ca6 remove unnecessary constructor param
new eaf3171 Merge branch 'master' into improve_record_consumer_interface
new 4e5da20 introduce state object
new 27c93d0 more use of the state class
new 8f2c7e5 slight improvement to the record reader
new 7dd9300 remove dependency of column io on column store
new e735459 remove unecessary class parameter
new 5334a2d add missing license headers
new 11b6a8f Merge branch 'improve_record_consumer_interface' into
FSA_codegen
new eaeff04 refactor reader
new c4c6991 rewrite of the code generation bit (work in progress)
new 180222e rename RecordConsumerWrapper
new eba3e21 Merge branch 'master' of github.com:julienledem/redelm into
better_InputFormat_logs
new f3f02f5 Merge pull request #18 from
julienledem/better_InputFormat_logs
new 0c3a1d8 Merge branch 'master' of github.com:julienledem/redelm into
improve_record_consumer_interface
new 1954277 removed cheesy comment and hardcoded max depth based on jco's
comment
new fa4c4ba removed unnecessary whitespace
new d68307e Merge branch 'master' into
fix_schema_passing_when_multiple_stores
new 448e8c8 Merge branch 'master' into FSA_codegen
new faf756e fixed visibility
new 758bb8b Merge pull request #21 from
julienledem/fix_schema_passing_when_multiple_stores
new d71d754 Make PigSchemaConverter Static
new 19106d0 Fix merge conflict
new 676c471 Make maps work with sigle column case
new db75e0b add missing base implementations
new 7aaf6f9 removed unecessary readOneRecord() method and commented out
old code
new 770cf4d more optimizations
new e5dec2b optimizations;fix some bugs
new b896112 trigger full gc before starting
new 48d9105 Revert PigSchemaConverter to static
new ba0f9ad Update gitignore
new d202560 Incorporate Julien's comments
new d18ec37 Merge pull request #23 from jcoveney/fix_simple_maps
new 78cff39 Merge pull request #20 from
julienledem/improve_record_consumer_interface
new cd7117b some modification to understand better impact of gc
new fd93ec2 Merge branch 'master' into FSA_codegen
new d5ea045 cleanup unused import
new da4b034 add license headers
new 2a3d6af make BaseRecordReader its own class
new 2883f89 merge both switch statements to optimize
new b89750b some cleanup based on Jco's comments
new 17271e7 initial integration with the new metadata
new f25dfdc fix row count per rowgroup persistence
new 8ec8a9d fix storer problem with page storage
new d63ded4 remove string type; make schema in footer use object model;
make ints little endian
new cd9e8fe integrate the schema change from children to parent
new 3043e75 optimize buffer copy; revert parent
new 2e09b42 cleanup string type
new be00886 better support for plain encoding
new 6cdf6aa generator for the int_test_file
new f3adb6e renamed to Plain to match the Encoding name
new 781f40f PLAIN encoding comformance
new 1bd47c3 generate TPCH customer
new 07d7aa0 change children_indices for children_count in schema
representation
new 867e24b fix repetition for root
new 9c7e5f5 add compression back; make page size configurable; rename
children_count to num_children
new 18d5082 fix decompression
new 04b60ee rework compression
new d8a18ce fix PrintFooter; fix string encoding; rewrite split
generation; fix block reading logic
new 2ea2954 split stores
new 4b06058 move compression to decode time and fix compressor init
overhead
new 113f8b8 reworked converter framework
new 372b9f3 fix projection using pig schema
new b38de8a add allocated usage monitoring
new a62af63 fix bit packing
new 15f6beb expose schema to pig
new 1823220 rename the metadata package
new b598378 removed reference to red file
new 90f027e change magic number to PAR1
new e6ca25c adding license header
new 08df187 Merge pull request #25 from
julienledem/integrate_format_changes
new 951cd2f Merge branch 'master' into FSA_codegen
new adcab26 Merge pull request #24 from julienledem/FSA_codegen
new 33aad25 rename package to parquet
new 73f6997 rename package to parquet
new 24c85c7 renaming maven artifacts
new ac3a806 renamed to parquet
new bd1fde9 cleanup
new 5164918 Merge pull request #26 from julienledem/rename_to_parquet
new 4a0bb74 cleanup exceptions
new 85d8f09 removed brennus dependency (for now)
new 5141722 integrate thrift changes
new cfdcdff turn off customer test
new c5a894a split hadoop; add thrift
new f6adf0b integrate new converter; cleanup
new 78ff38c improve logs
new 9058bae improve use of summary file
new b2efb50 refactor the read/write support
new bbf7932 move to official pig release
new ad9dcbe javadoc; original type support
new 42c418c javadoc
new f2746e7 adding example output/input formats
new 77097b6 ThriftParquetOutputFormat
new 92e93b1 license header
new c5ce15e license header
new 6bf2661 Merge branch 'master' of github.com:julienledem/redelm into
parquet_thrift
new af20471 javadoc; bug fixes; thrift support; refactoring
new 25b7559 thrift read protocol; fix repetition level size in little
endian
new 9c79b08 thrift input/output format support
new ecf19dc add encoding information for the column reader; allow column
writer to specify the encoding
new e8f8429 populate encodings in column metadata
new d35c264 turn byte[] into Binary object in the api
new 932695c update dependency on elephant-bird
new 07ea133 license headers
new ee8ec11 javadoc; turn off the compatibility test for now
new 54dd652 integrate the thrift changes
new 0973e7d removed outdated comment
new c3c22a0 Merge pull request #1 from Parquet/parquet_thrift
new 003299e Style cleanup and other miscellanea (javadoc, etc)
new 591c2b9 move compatibility test to the appropriate repo
new 76fd1d8 cleanup
new 962dd9e More cleanup/renames
new 7ea3721 Merge remote-tracking branch 'origin/master'
new d4a7da5 Merge pull request #3 from toddlipcon/master
new 25608ae cleanup
new ca1f11a Merge branch 'master' of github.com:Parquet/parquet-mr
new bf0bd57 Update README.md
new b946355 thrift enum and list fixes; ParquetReadToWrite
new 5daf068 more thrift bug fixes
new 668d74d exception cleanup; creation of parquet.hadoop.api package;
thrift from bytes support; bug fixes
new e1b2f29 Fix pom.xml
new 70abea6 Merge pull request #5 from mickaellcr/fix_pom
new ed7a067 move to groupId com.twitter so that we can publish to maven
central
new 0703f01 Merge pull request #4 from Parquet/thrift_fixes
new 45138bb integrating feedback from Todd; renaming PrimitiveColumnW/R
to ValuesW/R
new 6b9366b renaming classes and packages based on feedback
new e0dc2f3 reorganizing packages and deleting old classes
new 6dff626 Merge pull request #6 from Parquet/move_to_twitter_group
new 63949a3 Merge pull request #7 from Parquet/renaming_classes
new ef6b02c fixed merge issue
new 6e33c3d fix Filesystem access issues mentioned by Dmitriy
new 5a52fe1 fix Filesystem access issues mentioned by Dmitriy
new 12b99b1 metadata file in parquet format
new 251855b better metadata file tests
new 37ad86c removed old doc
new f7ba78a Update README.md
new dfd872b javadoc
new 37b7041 better tests for new summary file
new aa6dca8 integrate thrift format changes
new dabb797 Merge pull request #8 from Parquet/metadata_file
new 11c34bf Merge branch 'master' of github.com:Parquet/parquet-mr
new bb7d9f6 Update README.md
new bdec17e integrate Todd's feedback
new 4b45e24 Merge branch 'master' of github.com:Parquet/parquet-mr
new d07c10e Fix LICENSE
new 7a6c784 Update README.md
new a524979 Remove the old license.txt
new 7bd223a improved OutputFormat javadoc and defaults
new 7d90176 Merge branch 'master' of github.com:Parquet/parquet-mr
new 408785a adding back the license header used by the maven plugin
new 7a9a750 Update NOTICE
new 68f4a5d Update README.md
new c502c21 add deploy configuration
new bf1b494 Merge pull request #10 from Parquet/allow_deploy
new 241634e better exception when reading unknown field
new de8bc0d fix java 6 compiler compatibility
new 5fe97c8 add pig schema in thrift metadata
new 65364a4 fix map of primitive; add thrift to pig compat
new c44ff2d cleanup
new ef0069e incorporate feddback; more tests
new 82d5d85 Merge pull request #13 from Parquet/thrift_to_pig_compat
new e93e35c fix metadata file in mr mode
new df3e94a change default level encoding to bit packed; instanciate
reader from page header encoding
new 4828f1d fix metadata conversion
new d81380a add test to ensure enums are equivalent
new e087908 integrate elaphantbird 3.0.8
new 071b8f4 change ReadSupport api to fix projection support
new cf6dbc3 Merge pull request #16 from Parquet/update_encodings
new 5dfa1b2 deal with elephantbird handling of numbers
new 696bce4 use constant for settings
new f9c25b6 Merge pull request #17 from Parquet/thrift_to_pig_compat2
new a4e5f02 Update README.md
new cc5cd63 Update README.md
new fce6998 avoid string decoding recoding
new c625aa6 Merge pull request #19 from Parquet/improve_thrift_perf
new f5b2cb8 add better Bytes plain decoder
new 56faa1b first stab at dict encoding
new cf5ba49 fix perf test
new 1b2694e fix offset
new 8246e0b fix perf problem with new String(bytes, offset, length,
encoding)
new 5c60ed8 remove one array copy
new 60a2925 Merge pull request #20 from Parquet/perf_improvement
new e18d38b dictionary encoding
new 827a5bc fix dictionary encoding
new 4a69511 dictionary encoding
new b41e6e2 relocating jackson inside the parquet-thrift jar
new 90b5eae add scrooge and cascading support
new 9025c87 add indirect jackson jar dep
new dd057cf merge shade_jackson, fix compilation errors
new 3d63626 Merge branch 'shade_jackson' of github.com:Parquet/parquet-mr
into scrooge_scalding
new d732183 apply jackson shading to all modules
new d006b51 Merge branch 'shade_jackson' of github.com:Parquet/parquet-mr
into scrooge_scalding
new 8f9f0c7 integrate thrift change in format
new a4f133a cleaning methods
new 7025ede Merge branch 'integrate_format_changes' into
dictionary_encoding
new 8fc53c4 improve logging
new 5d16042 Merge pull request #21 from Parquet/shade_jackson
new e9aa5d5 Merge pull request #22 from Parquet/integrate_format_changes
new 6864d2e Merge branch 'master' into dictionary_encoding
new dc585ac Merge branch 'master' of github.com:Parquet/parquet-mr into
scrooge_scalding
new 9c62f41 improve dictionary
new 0734217 add license headers
new 9656ef2 improve api; improve logs;improve PrintFooter
new 08c8f82 use published EB, fix NPE in ThriftMetaData
new d5b3b7d better logging
new 0bb5e93 first stab at rle encoding
new 4fe082c BitPacking up to 31 bits
new f7a47d3 adapt Lemire's scheme to our value ordering
new f2ff9e8 add license headers
new 07c56fc add notice
new 4611c47 writer readers for int based packing
new a0926f3 add both orders as we might want to change our encoding in
the future
new b0e9609 more tests and bug fixing the Bit packing
new e3e8159 make things that look like closeables implement Closeable
new ee8c25e Merge pull request #27 from Parquet/make_closeable
new bca0411 Merge branch 'master' of github.com:Parquet/parquet-mr into
scrooge_scalding
new bcdddf7 implement byte based batch bit packing
new c2bdea0 address comments from alex l.
new 9942ce1 move simple RLE to generated bit packing
new aa851eb remove broken reader/writer
new 61d5170 fix bug where a required field would not be created at the
right level
new 8b69ad5 address comments from @J_
new 1d21646 Merge pull request #29 from
Parquet/fix_definition_level_for_nested_required
new f866bb8 more tests for optional vs required
new ee4a1b8 Merge pull request #30 from
Parquet/fix_definition_level_for_nested_required
new 9d2df13 Merge branch 'master' into dictionary_encoding
new d64c883 mae dictionary more generic; allow converters to understand
dictionaries
new feadc9e Merge pull request #24 from Parquet/scrooge_scalding
new 31b91ee address review comments by @squarecog
new 02b283d Initial support for Avro.
new 42c38a1 Remove unchecked generics warnings.
new 61a163d Honor repetitions correctly.
new 568bd7f Add Binary.fromByteBuffer method.
new 11bb824 Remove unnecessary level of grouping for array.
new e47f3b1 Fix creation of arrays and maps in converters.
new 58d8c52 Remove incorrect record initialization to compensate for
broken support for nested records (not yet fixed).
new e365840 Create generic Parquet reader and writer for object records.
new 1605cda Avoid copying bytes if ByteBuffer is array-based.
new 5121cd5 Avoid double conversion of bytes for Avro Utf8 instances.
new 64c45cb Add test for nested records following fix in
61d5170844aaf611555a0dd63c5e24af08acf1c8
new 80adbcd Remove -Xlint:unchecked flag from the build for the moment as
it causes CI to fail.
new d3c1a34 Fix compilation with Java 6.
new 34ce924 Merge pull request #26 from tomwhite/avro
new a5d72a4 Merge pull request #31 from tomwhite/java6-compilation-fixes
new 19e0902 make converters dictionary aware
new aed56c9 integrate the new bit packing for perf
new 133d845 Merge branch 'master' into rle
new a3e8963 Merge branch 'rle' into dictionary_encoding
new 67a3577 make field private; add braces for one line if statements
new 999b214 use BytesUtils.paddedByteCountFromBits everywhere
new b125dee make initial capacity a constant
new 7cb782c make a constant for constant value; remove outragous
System.out.println()
new e548432 add new line
new 4c11b66 typo
new 280cea3 make the API treat empty fields the same as missing fields to
avoid confusion
new 1db1018 turn on validation for generate TPCH
new 3f09751 making empty fields illegal
new 7c0f1a6 rename fromSequence to concat
new 5db276f cleanup import
new 0550545 Merge branch 'master' into rle
new 43dcb03 Merge branch 'master' into dictionary_encoding
new 6b867e5 skeleton for an efficient converter from groups to cascading
tuples
new 065a3c9 working selective tuple materialization for cascading
new 593a105 short class comment for the TupleScheme
new 1f0a8a2 replace DeprecatedContainerInputFormat with
DeprecatedParquetInputFormat, should build under MR2
new 5e82439 fix up cascading and scrooge to use
DeprecatedParquetInputFormat
new 20a4bf7 DeprecatedParquetInputFormat is not abstract
new 2676de9 Treat Fields.UNKNOWN as Fields.ALL
new a49a0e9 don't create a TaskAttemptContext in ParquetReader
new 74157a0 Fixed potential Integer overflow.
new fc0c7cd integrate RLE into dictionary encoding
new 7cb711c Merge branch 'rle' into dictionary_encoding
new 4a8913e update git ignore
new 249e889 Use a simpler serialization for cascading Fields to be
compatible with older cascading versions
new 8cb82ee javadoc
new c96e794 + needs space
new d6e3866 Merge pull request #36 from 0xh3x/master
new d822ef5 Merge pull request #35 from avibryant/deprecated-mr2
new 30b461e skeleton for an efficient converter from groups to cascading
tuples
new 62df123 working selective tuple materialization for cascading
new 2f0a779 short class comment for the TupleScheme
new 1ee87d8 Treat Fields.UNKNOWN as Fields.ALL
new f2ab7a2 Use a simpler serialization for cascading Fields to be
compatible with older cascading versions
new ffebada update ParquetTupleScheme to use DeprecatedParquetInputFormat
new 9222396 merge
new 3c96e97 Merge pull request #33 from
Parquet/handle_empty_fields_as_nulls
new 634cb77 fix bug when printing a ByteBuffer based binary would consume
the buffer
new aaa58d3 code formating and license headers
new 3dbccba add git hash in jar
new 9ec3565 Merge pull request #39 from Parquet/add_git_hash_in_jar
new 6855178 Update README.md
new 2d4be43 Merge pull request #25 from Parquet/rle
new 59f4b10 Use the standard readFooters in ParquetTupleScheme
new 88690f9 Fix Avro Read/Write support to work with the union-null
optional value pattern
new 5ed6162 mvn license:headers
new c7ebfbb Fixes based on Julian's feedback
new f3ee0c9 Merge pull request #37 from avibryant/cascading-tuples
new cec7b39 Merge pull request #41 from jwills/avro-null-unions
new a1fbcfb make total size include header size
new 75ead0a turn LOGs back to INFO
new 6c26ece Merge pull request #42 from
Parquet/make_total_size_include_headers
new f5ab5eb better error message when schema is unknown
new 6c1ccb7 Merge pull request #45 from Parquet/no_schema_error
new 0c25038 read version information from META-INF
new a20750a Replace JobContext#getConfiguration calls with reflective
call.
new a67ea4e add test for hadoop2
new 5ab0918 update dependencies to hadoop-client
new fae4c56 Merge pull request #32 from tomwhite/hadoop2
new 647825b Changed two utility classes to public
new 09a54d6 Rolled back one of the public classes
new 3e4561b Merge pull request #46 from laserson/public_utils
new 45b893c add maven-jar-plugin version
new 306868d Merge branch 'master' into version
new cd25359 add library version to metadata
new be136d2 Update README.md
new f0c42ca Merge pull request #49 from Parquet/version
new e1aa798 improve memory consumption in write
new b30d7fe reduce rep and def level buffer size. 8MB * 2 * #cols is way
too much
new f334966 Merge pull request #50 from
Parquet/scale_down_overly_enthusiastic_buffer_size
new 90aca3c Merge branch 'master' into improve_mem_usage_in_write
new a0e82a8 add improved memory management in hadoop layer
new 922f6c5 add setting to turn dictionary on
new 6a4f8d0 handle case when value is bigger than slab size
new 0347e7b adjust initial column size
new f7e7dd7 more unit tests for CapacityByteArrayOutputStream
new d636d60 Allow setting compressor, block/page size for ParquetWriter
new 11d8e0e Added javadocs
new 21bb59d Oops, forgot about default compression
new 34a8fb0 Propagated default sizes to the OutputFormat
new 0add8d8 add constant and override annotations
new 7f7fa72 Merge pull request #48 from laserson/choose_compression
new d0cc3a9 check initial size
new ea3eb7b Removed getCounter for compatibility
new a53dde1 javadoc and constants
new 34d0b5d Merge pull request #52 from laserson/getCounter_compat
new 5b25bb5 add constants and doc
new 4f4c5c4 Merge pull request #51 from Parquet/improve_mem_usage_in_write
new b6d1cb0 add a validation setting to OutputFormat
new 05f103b Fix bug that prevented writing optional Avro records, arrays
or maps
new 427137d Merge pull request #54 from massie/master
new a5b478e standard ordering of keywords
new 19b369b use checkNotNull
new e85b3c2 interfaces have public members
new df1ab6f introduce contants
new 70d3eeb better javadoc
new f9784bf remove unnecessary keywords
new 25d8ff2 javadoc and cleanup
new 5c2034a license headers
new 38d8a68 Merge branch 'master' of github.com:Parquet/parquet-mr into
dictionary_encoding
new 5cedb4a license headers
new 9a30c8f Merge pull request #40 from Parquet/dictionary_encoding
new 0a7c59a Speed up Avro string parsing
new b3e9432 First pass at RLE hybrid
new 0f9eee5 Cleanup
new 2ffdab7 Add test for setByte()
new d29eeb5 cleanup preconditions
new 27d1c2c Add javadoc to ByteUtils
new c71cb25 Merge pull request #55 from laserson/fastavro
new 71bd4e2 bit packing support for LE
new cd3a123 Start tests for rle hybrid
new c9c1080 Add bit packing overflow test
new 24f6524 End to end test
new 65fb8d3 move unpack
new 63ed719 Fixup / rename RLEDecoder, fix tests
new 99853be Merge branch 'master' into alexlevenson/RLE-bit-packing-hybrid
new ded5bec cleanup comments
new 14e0574 Merge pull request #57 from Parquet/bit_packing_lsb_first
new abb6e36 Address first round of comments
new cd6a02d Merge branch 'master' into alexlevenson/RLE-bit-packing-hybrid
new d7fe1a5 Use RLE for repetition / definition levels
new 47824ef Merge branch 'master' of github.com:Parquet/parquet-mr into
add_validation_setting
new af0ccf9 better error message and javadoc
new c211582 Merge pull request #53 from Parquet/add_validation_setting
new aed1dca dictionary encoding header is now bitWidth instead of max
dictionary entry id
new 7cf85cb Fix RunLengthBitPackingHybridValuesReader
new e70652e Merge pull request #59 from
Parquet/dictionary_encoding_format_adjustment
new 2dbd0d2 Remove logic for valueCount > Integer.MAX_VALUE
new 3ff63fd Merge branch 'master' into alexlevenson/RLE-bit-packing-hybrid
new 6e65166 fix bit packing encoding bug
new 1d13a61 create and use checkedCast()
new b7f6946 Merge pull request #58 from
Parquet/alexlevenson/RLE-bit-packing-hybrid
new e0a5920 Merge branch 'master' of github.com:Parquet/parquet-mr into
fix_bit_packing_encoding_bug
new 7e3ef9f Merge pull request #60 from
Parquet/fix_bit_packing_encoding_bug
new c773446 ability to read version number from parquet jar Version
utility
new a2b7a65 when there is more than one row group the converter will get
multiple dictionaries set
new c3596a9 Add support for 4 byte length written at the beginning of rle
columns
new c4c77ba add support for ReadSupport specific info in split
new 4950149 Merge pull request #61 from aniket486/master
new e27f871 Merge pull request #63 from
Parquet/alexlevenson/fix-rle-4byte-length
new f7fbed1 fix comments
new 04ac202 Merge pull request #62 from Parquet/fix_dic_decoding_bug
new 1a30dcc Merge pull request #66 from
Parquet/alexlevenson/fix-rle-comments
new ff109bc Merge pull request #65 from
Parquet/ReadSupport_specific_info_in_split
new 5f0f929 Added filtering functionality
new ef5c143 Added avro specific functionality
new 61239a0 Added avro specific functionality
new 48bb48e Added avro specific functionality
new f3ed65b fix ValueStat max value
new 8285b62 Fixed bug querying on Name,Url
new ac5cbd1 Implmented more efficient skip algorithm
new 483dd9f Merge pull request #67 from svzdvd/master
new e440108 Add support for snappy compression.
new 7b74290 Fixed test case.
new 2519b95 Merge pull request #70 from Parquet/snappy
new be8e4e9 Merge remote-tracking branch 'upstream/master'
new 1d7a5c3 Fixing after code reviews
new e2e8edb small updates to README with feature matrix and other
improvements
new 6a0dffd fix readme links
new 08b7aeb support for schema compatibility
new 7b47079 Should not write data if the RL/DL is all zeroes
new 29ba92c add a bit more pig detail
new c98f467 Merge pull request #73 from
aniket486/write_no_data_for_no_RL_DL
new 94afb19 fix dictionary decoding bug when more than one encoding is
used
new 96bf78d Merge pull request #71 from Parquet/update_readme
new 707bdf0 add negative tests
new 364b4a0 Merge pull request #72 from Parquet/schema_compatibility
new efc5982 fix schema compat
new 18122b4 Merge branch 'master' into schema_compatibility
new 3afe647 Merge pull request #75 from Parquet/schema_compatibility
new 992a47e fix call to converter
new 5fc6728 Merge pull request #74 from Parquet/fix_dictionary_decoding
new c4b14fb Adding APL headers and test for union schema creation.
new f52a26e Renamed checkValueRead.
new 80a449d Updated from master
new caa8e51 split Plain reader so that the reader knows what type it's
reading
new 653a4cf collapse small classes into one class
new 89d6b17 refactored bit packing
new f1da1c7 Merge pull request #79 from Parquet/split_plain_reader
new 806b548 fix for schema compatibility
new 33c8dc5 Merge pull request #81 from Parquet/fix_schema_compatibility
new 43dc88d adding projection support for thrift types
new adb46b6 make splits report actual length
new 9a5d597 Merge pull request #83 from atkeano/thrift_read_projections
new 6bf0597 Merge pull request #80 from Parquet/encodings
new 6030c9e Merge pull request #84 from
Parquet/splits_report_actual_length
new 4de2744 fix bad merge
new 79a3106 reduce memory usage of metadata
new fc56631 Initial checkin for load pushdown
new 5214a65 minor fixes and refactor
new 64814a6 make fields final
new 80e72a5 Merge pull request #85 from
Parquet/reduce_memory_usage_of_metadata
new feecf58 adding tests and removing comments
new ef2aa8e Merge branch 'master' into filtered_reader
new 7e31eec Merge pull request #89 from Parquet/filtered_reader
new 3b9e6d8 Update README.md
new 91fa2c4 Update README.md
new d8e6ba3 added code review changes
new 5a5bb7f initial commit for recursive listing
new 3921742 small fixes for hadoop2 failure
new 61f2c86 add buffer to protocol pipe
new c2ad764 Merge pull request #90 from aniket486/list_recursive
new 9f41d31 Merge pull request #86 from aniket486/load_pushdown
new 02d5ed2 fix merge conflict
new aa0bc13 Add Avro specific support to AvroParquet{Input,Output}Format
new de0d0cb Merge pull request #94 from massie/avro-specific
new 3f19ce3 reduce size of splits
new dd20df1 Merge pull request #87 from Parquet/reduce_size_of_split
new f46bdaf Merge pull request #93 from
Parquet/add_buffer_to_protocol_pipe_in_master
new a80029e try github site integration
new aedcdde fix pom conf for github pages
new 4643a5c add coverage report to site
new 6b5b8b2 improve memory usage of metadata
new 54ac6b4 license headers
new f7d0987 fix compilation issue with 1.6
new 964e5da Add support for schema projection in Avro
new 62c3155 Merge pull request #96 from massie/master
new c1d67ee Add support for predicate pushdown in ParquetInputFormat
new 435b13b Merge pull request #98 from massie/matt-pushdown
new 8c1032d Merge pull request #97 from
Parquet/improve_memory_usage_of_metadata
new c7a8eaf Start implementation parquet for hive :
new 2471e51 Remove any K,V from DeprecatedXXFormat
new fcc88f3 Add support for CombineHiveInputFormat
new f5ca27b Improve column reading
new 4fe18c5 Can write complex types
new b4d1c71 Can read some complex types
new 07e54a5 rename one parameter
new 1ada3d2 Some improvements on the hive implementation :
new ebf76d4 Indentation : retab to 2 spaces. Nothing else.
new 126da3c Give selected columns to ParquetInputFormat : Done
new 910e7cd Add equals and hashcode methods to BinaryWritable
new 7f534d5 Implement a basic version of the SerDeStats object for
ParquetHiveSerDe
new 6c219a5 Fix Short object for Hive (use short for short instead of
byte :))
new bd826ec Add a simple unit test for ParquetSerDe
new 33e0131 Add some unit test in order to test :
new 1dc42f0 Improve the pull request following advices from Julien
new 74be528 Add full support for array and map reading
new d615729 Add unit test for storage
new f2d9e81 update hadoop version
new 4198153 Fix compile with abstract methods
new cc6754c Fix CombineHive bug
new cee774c Fix more combine stuff
new 174c26a Correct fix to CombineHive
new 3ee49a5 Change MapWritable to ArrayWritable (perfomance improved !)
new 8234945 Remove unused parameters
new 543cdc2 Fix the size of the value array
new bb6e2ff Hadoop 2.0 compatibility, hive 0.10
new 0ec089d Add metadata in ReadContext instead of Split
new 82fff8c Clean up ReadSupport init
new 3089e83 Manage count 0
new 21b0d97 Update with advices from Julien
new eccbba1 Improve speed for queries like count(0), in which we only
need the number of lines
new bf7263b Try to fix travis build
new 26360f3 Minor changes
new 2525587 Update getSplits in DeprecatedParquetInputFormat
new 4cf6ae1 Code review
new ed9eec8 Add "how to contribute" to README.md
new a2ec5cb Merge pull request #102 from Parquet/how_to_contribute
new 61f2260 Merge pull request #95 from Parquet/doc
new 92c450a Merge pull request #28 from mickaellcr/parquet-hive
new 273ecd4 Update Hive support status
new 67b8423 ThriftParquetReader and ThriftParquetWriter
new be49204 change default page size and add some doc
new 0fbd026 Merge pull request #105 from Parquet/ThriftParquetWriter
new b7fe532 Merge branch 'master' of github.com:Parquet/parquet-mr into
change_default_block_size
new 8a62bb3 fix doc
new b8576b9 Update README.md
new 4bc2433 [fix validation script] when boolean value is null, set it to
0 for being compatible.
new 265ef24 Merge branch 'master' of
https://github.com/Parquet/parquet-mr into
fix_boolean_default_value_for_tuple_converter
new 0e8f1f7 Merge pull request #107 from Parquet/change_default_block_size
new eac5aec 1. return compatible schema when compatible flag is set. 2.
tupleConverter set to return IntegerConverter when flag is set
new c4e8d26 optimize code format, add log info to indicate boolean will
be convert to int when compatible mode is on
new 7a4b562 add if debug statements to parquetloader
new a7c42f9 Make writer independent
new 504833e Make reader independent
new 7d1fe78 remove space, add braces for readability
new 82579c9 Merge pull request #108 from Parquet/elephant_bird_compatible
new 09ba2fb Merge pull request #111 from aniket486/master
new 3c55f58 Merge pull request #112 from tomwhite/issue-64-mr-indept
new a6e3fe7 upgrading scrooge runtime version to 3.1.1
new 2289141 Merge pull request #113 from aniket486/master
new 6218072 move github site to profile
new bb74d3d add description
new 6eec81d [maven-release-plugin] prepare release parquet-1.0.0
new 78481ad [maven-release-plugin] prepare for next development iteration
new aca5615 Update README.md
new ac0caa0 Update README.md
new ecb2dac refactro column reader
new f8dd208 simplify end of page count
new aa530c1 Merge branch 'master' of github.com:Parquet/parquet-mr into
refactor_column_reader
new 26932a5 Minor README.md fix
new c126179 remove raw type for ParquetTbaseScheme to support thrift0.5;
remove scalding dependency
new 2a32f34 Merge pull request #119 from Parquet/thrift_05_compatible
new 86ae4f8 fix wrong converter: use TBaseRecordConverter for
ParquetTBaseScheme; Add unit test for getting correct record converter
new a54414c use Mockito to mock varibles in test, fix format and variable
name
new 5a25926 Merge pull request #121 from
Parquet/fix_wrong_record_converter_class
new 6232daf fix javadoc
new 1fc0698 Fix RLE bug with partial literal groups at end of stream.
new a9e2c7d Fix Short and Byte types in Hive SerDe.
new 35c6dc6 removing github-pages-site target before releasing
new 8ffe30d [maven-release-plugin] prepare release parquet-1.0.1
new 05c73c4 [maven-release-plugin] prepare for next development iteration
new f17b83c added unit tests for parquet cascading
new 7e16d31 better format
new f327ecf format
new e4269e6 remove blank lines
new 3a3b73a Merge pull request #126 from
Parquet/unit_tests_for_parquet_cascading
new 3c86936 Merge pull request #120 from Parquet/rle_fix
new 8d66611 adding dictionary encoding for long,double,int,float
new 28da58c Fix Snappy compressor in parquet-hadoop.
new 3934855 update plugin versions for maven aether migration - fixes #125
new 3fad4bd Merge pull request #133 from atkeano/maven_build_errors
new 87228eb refactoring dictionary encoding for non string types after
comments #127
new 12bc29a split out method to facilitate the inliner job
new 99673f2 Merge pull request #127 from atkeano/dictionary_encodings
new ce8c1a4 Merge pull request #118 from Parquet/refactor_column_reader
new af45d9c fix bug of wrong column metadata size
new 3fb938b Merge pull request #138 from Parquet/fix_column_metadata_size
new 6ff0264 Implemented partial schema for GroupReadSupport
new c9b213c Merge pull request #123 from Parquet/snappy_codec
new a274684 added 3 counters to parquet for benchmarking bytes read and
time spent
new 43ad5e5 fix test
new 2cc9321 add test for no benchmark counters
new ccf32c7 remove comments
new ea62ffe add unit test
new 9394c09 fix test
new c269726 formatting
new 4288aa6 formatting
new d4ef8d9 Merge pull request #140 from
Parquet/partial_schema_for_group_read_support
new 1323e4f Merge branch 'master' of
https://github.com/Parquet/parquet-mr into hraven_counters
new 397b4c9 formatting
new f8e2658 fix incrementCounter getConfiguration method to support 2.0
new 3bebb9a fix
new 35c419c remove public Constants
new cb6d3d2 Merge pull request #141 from Parquet/hraven_counters
new 9ef85a0 merge
new 42ad701 fix test file path
new 4d1b3e0 add unit test
new 09bfe99 Merge pull request #124 from Parquet/serde
new 92ce68d fixed
new 77bede5 add test
new fb67069 fix space format
new 37bd05d Merge pull request #142 from Parquet/fix_total_size_row_group
new f61a123 Merge branch 'master' of
https://github.com/Parquet/parquet-mr into fix_empty_encoding_col_metadata
new d2878f8 Merge pull request #143 from
Parquet/fix_empty_encoding_col_metadata
new 8f93adf Map key fields should allow other types than strings
new 9adb8e2 code review changes
new c22a357 Merge pull request #144 from aniket486/map_support
new b500681 add getStatistics method to parquetloader
new e5b767a Add some nested type tests and fix Map handling
new 8cc147b Add map and list to in/outputformat unit tests
new c76880d Merge pull request #146 from Parquet/hive_nested_types
new 808a90d changing default block size to 128mb
new 4202efb Merge pull request #149 from aniket486/change_block_size
new aab7b4b code review comments for stats
new 4b4fb0e Merge pull request #145 from aniket486/stats_loader
new bee8378 [maven-release-plugin] prepare release parquet-1.1.0
new 62cc2c2 [maven-release-plugin] prepare for next development iteration
new bbae83d Create CHANGES.md
new 7b2ef26 add thrift validation on read
new 8a8354b add better error message
new 71a6d88 add better error message
new c8ba085 Merge pull request #150 from Parquet/add_thrift_validation
new 945d1bd [maven-release-plugin] prepare release parquet-1.1.1
new 0784dc9 [maven-release-plugin] prepare for next development iteration
new 91c1711 fix projection on required fields and refactored unit tests
for column IO
new 69ef1f4 fix file path
new 413418e added release 1.1.1
new c023d63 improve thrift error message
new bde6493 Merge pull request #153 from
Parquet/fix_projection_required_field
new ec46329 use globbing syntax to specify manual pushdown in
ThriftReadSupport
new 17e2511 Merge branch 'master' of
https://github.com/Parquet/parquet-mr into
manual_pushdown_for_thrift_read_support
new 30359b4 remove TODOs and fix format
new 3553c02 add site target to update-github-site profile
new 874e470 indent fix, remove tabs
new 57b1e0f Merge pull request #156 from Parquet/fix_site
new 459a8a1 make counter works in DeprecatedInputFormat, which is used by
cascading
new d20e5f2 fix tests
new 00065cd change filter key name to parquet.thrift.column.filter,
remove extra filter parameter from ThriftSchemaConverter
new bb06859 add license and comment
new e371fbb add comment
new 6dfd975 Merge pull request #159 from Parquet/counter_for_mapred
new 6b96924 Merge pull request #155 from
Parquet/manual_pushdown_for_thrift_read_support
new b045ac1 Resource leak in
parquet.hadoop.ParquetFileReader.readFooter(Configuration, FileStatus)
new 848fa8e support schema evolution
new c32be9e thrift schema evolution support
new f369a13 validate output
new f0d30df refactor schema converter
new 8faaaf0 support projection on only key of a map
new 2043741 add thrift idl for testing
new 7f08eee add license headers
new 1a711f0 turn off projection from scrooge
new eb15665 Add test cases for reading/writing Avro records with empty
arrays and maps.
new 50feb33 Fix tests for reading and writing Avro records with empty
arrays and maps.
new 0cedaf2 remove debugging code from hot path
new 9594bba address review comments
new 6f25a0f Correctly handle Avro records with empty maps and arrays.
new 26d09f4 javadoc
new 8c51a42 better error message
new 8fa09f0 Merge pull request #163 from Parquet/thrift_perf
new 80c3a2a Merge branch 'master' into schema_evolution
new 60a3468 fix test
new cfc91fc almost there... now working on not to use thrift class, so
it's compatible with scrooge
new 00a5d5b migrated to using ThriftStruct for schemaConverter, do not
use thriftClass
new 7387a61 passed all test, fix map, removed tests for pull in required
fields
new 147a3f0 start! do not check required field, failing test
new 55c14bf javadoc
new e5cb3c8 fix test
new e9f2550 parameterize dictionary
new 12d1ac4 fix noisy warning
new 47116ad fix schema merging
new d4dbf0b Merge pull request #160 from adityakishore/master
new 05a0106 Merge pull request #161 from Parquet/schema_evolution
new 5b36d9c make buffered by default
new 2de2567 Merge pull request #154 from Parquet/add_thrift_validation
new 4170539 [maven-release-plugin] prepare release parquet-1.2.0
new c4515db [maven-release-plugin] prepare for next development iteration
new b3efce2 fix compilation problems
new 3c274d1 Merge branch 'master' into fix_avro_empty_maps_arrays
new 70226b9 distinguish recoverable errors
new 68fa6cd fill in missing fields, only for str now, will refactor to
visitor pattern
new 7dfa864 visitor pattern for string, test passed
new 1bf9d5f extracted inner classes from ProtocolEventsGenerator
new c59e82f implemented all dummy values
new 9a1e295 merge master
new 6f374b7 store thriftType in converter[fix merge error]
new 3e160d9 add license headers
new b4a8eb1 fix test
new 3060f85 added unit tests
new 8aadc0a fix bug, use a new list for fixed events
new b812389 inline some classes
new f702fdf better naming
new 133b252 add missing file
new 073e202 fix test path
new eff7237 remove converted Type
new d7b0083 Add empty map and array to test Avro schema all-minus-fixed
and add empty map and array fields to parquet-avro test that tests fields of
all (except fixed) types.
new 365d84e prepare for commit, remove format diff
new 5ca7671 Re-enable test for fixed type fields in Avro TestReadWrite.
new ffbdf6d visitor pattern for schemaConverter
new 2b2837f refactor matching filter
new 079e295 rename
new 7b68b47 sucess: compile scrooge generated classes in parquet-thrift
new 5393833 migrated tests to parquet-scrooge [tests passed]
new d9ce726 add test in scrooge [only maven passed]
new 0570f46 better fallback mechanism
new 015ed30 fix oom error dues to bad estimation
new 1e472d2 Merge pull request #167 from Parquet/fix_oom
new 92f58ac [maven-release-plugin] prepare release parquet-1.2.1
new dae37cf [maven-release-plugin] prepare for next development iteration
new 1d92804 Add typeLength to ColumnDescriptor.
new a7ba48b created ScroogeSchemaConverter
new e2d3bb2 [style]fix if...else in ConversionPatterns
new 64e6d82 [style] add spaces around =
new 99b6dfc remove julien's TODO
new bbc0aa7 Merge pull request #166 from
Parquet/avoid_pruning_required_fields
new 9d84697 merge master
new f6f3eaa broken tests for scroogeRead
new 3803d2d Plumb FIXED type length from Avro schema through to Parquet
metadata.
new b4c45d3 fix bug, missing break in thriftSchemaConverter
new dcc0d81 Merge branch 'master' into fixed_len_byte_array
new 04784c2 test pass
new ceef971 remove unused compat.thrift
new 2a2696d format
new 78b3f86 update scrooge denepdency, add unit tests for reading in
scrooge
new 249581d add TestCase for scrooge schema converter
new c1f3512 change some ParquetOutputFormat interfaces to mirror
ParquetInputFormat (and be useful for writing a DeprecatedOutputFormat)
new ce6bfcc add a DeprecatedParquetOutputFormat to mirror
DeprecatedParquetInputFormat
new 7adc264 add another getRecordWriter overload
new 9ae1d88 add Sink functionality to parquet.cascading.ParquetTBaseScheme
new a00fd5e remove tests that check that TBaseScheme doesn't support
writes
new 521d081 add some convenience methods (from ParquetOutputFormat)
new 93d6770 add a simple test for DeprecatedOutputFormat
new 34bbb90 missing copyright notice
new bba9775 two unused imports
new 3c65205 field requirement depends on if the getter returns option
new e2fec1c add optional map field to thrift file
new 234a1cb extracted key and value type from map and optional map
new 9a5eea0 working on map
new d32e65e downgrade scrooge version to 3.6.0, which is the latest
version on maven central
new dd02df0 added map test
new 8b84a9e specify scala version for scrooge
new 5aa7a68 accidentally deleted a space
new b11e2a0 Plumb type_length for FIXED types through to reading pages.
new 4e82ab6 Add methods to write fixed Binary without prepending length.
new 1f63013 Added two boolean options for record filters.
new 58051d0 Added functionality to allow users to implement functions to
be used as predicates.
new 232d521 use class.getName
new 038a400 update scrooge to 3.8.0
new 0f9e39b Merge pull request #165 from
Parquet/distinguish_recoverable_exception
new f8ac0f0 Initial end-to-end write and read support for Avro FIXED
fields without runtime exceptions, but still with data representation issues.
new 4f1493b Fix broken tests. Test failures encountered previously were
due to broken tests.
new 08b45b0 Add fixed_len_byte_array to oneOfEach in TestColumnIO.
new 201d80c Merge branch 'fixed_len_byte_array' of
https://github.com/davidzchen/parquet-mr into fixed_len_byte_array
new e8c2a39 better log messages
new 562e811 make binary dictionary encoding use fastutils; fix tests
new f98cd39 shade fastutil and keep only used classes
new 1c9c19c Merge pull request #171 from Parquet/scrooge_tests
new bce04eb add 1.2.0 and 1.2.1
new 8305bdb Add FixedBinary type by creating a wrapper class around
Binary and plumb FixedBinary through for read and write support for
FIXED_LEN_BYTE_ARRAY. Undo change to add FIXED field to oneOfEach schema for
parquet-column TestColumnIO for now.
new a48f56f Re-add FIXED_LEN_BYTE_ARRAY to oneOfEach and plumb through
FIXED support for example Group. Test still fails and need to solve read issues.
new 1c99a11 rename variables for readability
new 108509f Merge pull request #173 from Parquet/better_log_messages
new bbf3448 add overloaded getFooConfiguration(JobContext) methods to
ParquetOutputFormat
new 310e551 throw the writeSupportClass as part of the exception message
if instantiation fails
new 20f3f46 continue renaming
new 6da7594 fix problem with projection pushdown in parquetloader
new cc59cb8 Use ValuesWriter and ValuesReader specific to
FIXED_LEN_BYTE_ARRAY rather than overloading on a FixedBinary class.
new 3e02a84 Merge branch 'master' into fixed_len_byte_array
new e37cd2b Add fixed field to parquet-avro TestSpecificReadWrite.
new ebe07c6 changes as per code review comments
new a73e73c changes as per code review comments for test
new 4694001 Merge pull request #175 from Parquet/pig_projection_pushdown
new 51d3332 Merge pull request #174 from Parquet/readability
new b7a39c5 [maven-release-plugin] prepare release parquet-1.2.2
new 0688404 [maven-release-plugin] prepare for next development iteration
new 3b5d32b Add new Vin field to Avro TestSpecificInputOutputFormat.
new e6ebda9 Update CHANGES.md
new bc31bda Update CHANGES.md
new e6fab06 Document why FIXED_LEN_BYTE_ARRAY is not supported with Avro
specific schema right now.
new e242085 add an empty constructor for ParquetTBaseScheme (which only
works for reads)
new 3105009 add read and write tests for ParquetTBaseScheme
new 458dd70 remove redundant test
new 3b7359b Re-enable tests for writing FIXED for Avro Specific records.
Preliminary end-to-end for writing FIXED but write is still not completely
correct yet.
new 8e4278b missing test resources
new 1b326b7 Fix reflection for converting fixed Binary to Avro
SpecificFixed. Ensure that FIXED values are written using the FLBA
PlainValuesReader when dictionary is enabled.
new 12d41ac De-fluffify inadvertently added whitespace changes.
new e31d46c merge master
new a79eab7 Complete support for supporting FIXED_LEN_BYTE_ARRAY for Avro
SpecificRecord. Add syntax to specify type length for FLBA type fields to
MessageTypeParser.
new 24d7267 Remove print statements.
new 976c68a Merge branch 'master' into fixed_len_byte_array
new 5e0dba7 upgrade scrooge to 3.9
new ca7da65 basic support for map
new 6a6613f tests all primitive key types in map
new 49f3ad1 Add support for reading FIXED_LEN_BYTE_ARRAY to Pig support.
new 5ab6ccc Merge pull request #169 from
davidzchen/fix_avro_empty_maps_arrays
new 0822e32 add unit test for primitive value for maps
new 9cd6737 Add comments to new files.
new d9ced33 test optional map
new f45b384 convert list and unit tests
new 5410381 convert set and unit tests
new d0fc6a0 Correct schema syntaxes for TestHiveSchemaConverter.
new 6ffc1b9 implemented conversion for enum
new bfcb120 implemented map with nested structure, TODO: tests failing
since the default requirement can not be determined
new 835f12e refactor code
new 4b2cb26 Added unit tests for predicates. Got predicates compiling,
and passing on tests.
new 1ee4232 Merge changes from master that fix handling empty Avro arrays
and maps.
new 52c32a3 Removing predicate functions to prepare for pushing or/not
filters. Limits number of features pushed.
new b49018b Merging in changes from main repository that I have forked
from to minimize work after pull request.
new 2c5e07f Add support to AvroWriteSupport for writing out records with
maps containing Utf8-type keys.
new 3778e45 Pulling in clean modifications for adding ColumnPredicate
functions.
new a2895bf Remove print statement.
new 0ad94f9 Fix a maven warning about a missing version number.
new 6ec199d Disable the time read counter check in
DeprecatedInputFormatTest.
new 7802a9a Update ParquetReader to take Configuration as a constructor
argument.
new bb52d33 Merge pull request #179 from fnothaft/master
new a258aae Merge pull request #182 from wesleypeck/fix_maven_version
new a146ebb Merge pull request #183 from wesleypeck/fix_timeread
new e20c490 Merge pull request #184 from
wesleypeck/parquet_reader_projection
new 753473c Change syntax for fixed_len_byte_array to placing length
parameter after type name rather after field name.
new 9aad641 Move reflection checks for specific Avro Fixed type into
FieldFixedConverter constructor.
new a8d99d9 add an assertion to check the output created by reading with
ParquetTBaseScheme
new 2ffde6a Merge pull request #172 from
colinmarc/cascading-tbase-write-support
new d24f4a7 fix
new 4e6863a add checker
new 6d9d2b3 add test
new f325418 generate_json
new 5bd87b1 check compatible
new 8856e45 todos
new bd311f5 fix_test
new 001e3de map checker
new 5bf6127 SetChecker
new 5427f44 list checker
new 2fb1f7d accept visitor
new 64b2f72 fix
new 0d39e1c add compatibility report
new 6f0f236 fix tests
new 2b62da3 requirement check
new 2731c0f refactor tests
new 2ed9b50 fail when required field is added
new fdb0725 add tests for list set map
new 9652d97 add parquet-pig-bundle
new e3292ed Merge branch 'master' of github.com:Parquet/parquet-mr into
pig_BUNDLE
new ff4d13a add null check
new 5adf79f Merge pull request #180 from davidzchen/fix_avro_utf8_map_keys
new 7247538 compatibility runner print more detailed info
new b3b0bbb fix indent
new 1e61b40 fix version
new 7c6ba3e Merge changes from master.
new 989e9dc Plumb OriginalType through to ConvertedType in file in
ParquetMetadataConverter.
new e40dcfb Merge pull request #181 from davidzchen/fixed_len_byte_array
new 5303197 Merge changes from master.
new c1ac1af Merge pull request #186 from Parquet/pig_BUNDLE
new 1ae6772 add dummy file to generate source jar
new b168fd1 [maven-release-plugin] prepare release parquet-1.2.3
new 1326c00 [maven-release-plugin] prepare for next development iteration
new fd3b05c release 1.2.3
new cfd63fd Fixed issue with test case that was causing runtime error.
Was trying to call getInteger on long...
new 308c1b4 Added two boolean options for record filters.
new eb35ba8 Added functionality to allow users to implement functions to
be used as predicates.
new c6a4d18 Added unit tests for predicates. Got predicates compiling,
and passing on tests.
new 8be341f Removing predicate functions to prepare for pushing or/not
filters. Limits number of features pushed.
new 64921da Merge branch 'master' of github.com:fnothaft/parquet-mr
new 3edf60d Manually merged in conflicts in TestFiltered.java.
new 0a36e35 Fixes #189: NPE in DictionaryValuesWriter.
new fd2935b Merge branch 'master' into plumb_original_type
new 714335d compare json
new 0d25979 Merge branch 'master' of
https://github.com/Parquet/parquet-mr into compatibility_checker
new 820fb75 remove unused test
new dc425e4 show field name when they are not compatible
new 5cad37b remove unused command from CompatibilityRunner, add comment
for rules used in compatibility checking, add license header
new 82052b8 Merge pull request #191 from Parquet/compatibility_checker
new 09bcb1b fix comment
new eff1e5f Merge pull request #192 from Parquet/comment_fix
new da05b13 [maven-release-plugin] prepare release parquet-1.2.4
new 6dc4732 [maven-release-plugin] prepare for next development iteration
new ff55567 Update CHANGES.md
new 04ad0c4 Merge pull request #190 from wesleypeck/fix_dvw_npe
new 2020142 refactor serde to remove some unecessary boxing and include
dictionary awareness
new 50bd144 Merge pull request #194 from Parquet/hive_perf
new 763dfde Fix for columns list missing from the conf
new 2660598 Update README.md
new 422dfe0 Updated files to add applyFunctionToBinary, and add specific
interfaces for primitive types.
new 82f882f Merge branch 'master' into plumb_original_type
new 10f266a Cleaning method signature for binary case.
new 73c8629 Misunderstood previous comment. Fixed binary predicate.
new cf0ee72 Merge pull request #188 from fnothaft/master
new 256a3a1 Fix issue 193: RLE decoder reading past the end of the stream.
new d4eeecc Merge pull request #197 from Parquet/issue-193
new d9e5f0b Implement correctly Settable inspectors
new c5f68c5 Extract primitive inspectors and instantiate them only once
new 4bdaec0 Fix #177: Inspect key when accessing maps
new 9064500 Add some javadoc to clarify
new 22cf7fe Inspect keys only for a few types in parquet hive maps
new c9146a6 Merge pull request #196 from Parquet/hive_fixes
new a991eff Merge branch 'master' into dictionary_changes
new 0a76cc2 Fix #198: simplify TupleWriteSupport constructor
new f52d35b Merge pull request #164 from Parquet/dictionary_changes
new 5601394 make static field final
new a905704 Merge pull request #199 from Parquet/simplify_tuple_write
new a736c62 Merge branch 'master' into plumb_original_type
new 9435918 Fix requested schema when recreating splits in hive
new 005bc68 add null check for EnumWriteProtocol
new 8edc893 Initial commit
new 5c46f05 initial commit
new dfa27b4 Merge branch 'master' of
https://github.com/lukasnalezenec/parquet-protobuf
new dd536a4 Delete todo.txt
new ab4cb69 Make the ParquetLoader.inputFormatCache HashMap a WeakHashMap
in order to free memory for long running processes that do not leverage caching
new 5f29f43 use a new string in order to enforce weak reference on the key
new 93780e0 use a new string in order to enforce weak reference on the key
new cf1f442 use a new string in order to enforce weak reference on the key
new 21efc9b use cascading 2.2.0
new 8edc102 throw ParquetEncodingException
new a148e16 Merge pull request #205 from fs111/master
new be83477 Merge pull request #203 from
Parquet/check_null_for_enum_write_protocol
new 3b63d13 fix comment
new c5ce1fa fix comment, remove size
new a34df07 Merge pull request #204 from
aaghevli/ParquetLoader.inputFormatCache
new 4649453 Merge branch 'master' into plumb_original_type
new e337bd2 Protobuf conversion over Java types
new a6aa8f7 [maven-release-plugin] prepare release parquet-1.2.5
new 9c5bdb6 [maven-release-plugin] prepare for next development iteration
new 170797f Create PoweredBy
new 9a91418 Create PoweredBy.md
new 05c0f06 Delete PoweredBy
new 6f1e812 Update PoweredBy.md
new 75a6102 Update PoweredBy.md
new 12a1cd7 Update PoweredBy.md
new 70a433b Update PoweredBy.md
new 81a1af0 improve fallback for IntDictionaryWriter
new 1a4bb6a Merge pull request #206 from Parquet/PoweredBy
new 11f30fa fix bug, add rawDataByteSize for dictionaryValuesWriter to
decide if fall back to Plain encoding or not
new be6a4ae fix bug: reverse dictionary lookup for fallbacking to plain
encoding
new 4d55b59 improve fallback for float
new 1afdf14 minor fix, the length used in RLEValuesReader
new d942b45 format
new 55a451c Merge branch 'master' of
https://github.com/lukasnalezenec/parquet-protobuf
new 3c99aa3 improve fallback for double
new b84e272 Merge pull request #207 from Parquet/fix_offset
new 245d43e use primitve array for int, float , double, get rid of auto
boxing,unboxing
new c9b768f improve long fallback
new bee6755 improve binary fallback
new d33aa40 bug fix: separate fallBackDictionaryEncodedData to a method,
will always be called when fallbacking to plainEncoding
new 1278cde remove unused import
new 65ca5ed Copyrights in converters
new 492da11 remove hash lookup and unused comments
new edfd7d9 return raw data size as bufferSize in dictionaryValuesWriter
new 66900aa more comment
new 7427a89 revert fixing page cutting, fix bug, raw data size should be
long
new 198f554 revert revert.. use rawDataByteSize as buffered size in
DictionaryValuesWriter
new a7de264 Specification of written protobuffer class in output format
new 2e78704 Code cleanup
new 090a2a4 Code Cleanup
new 1bec97f Projections in read support
new 40ae3fb artifact version changed to 1.2.5, unused dependencies
removed.
new 0d47734 Add test on DeprecatedParquetInputFormat.getSplit()
new f479aee Merge pull request #208 from Parquet/improve_dic_fall_back
new 7c2785f Merge pull request #202 from Parquet/hive_requested_schema
new 6e90041 Merge branch 'master' into plumb_original_type
new 59bd08b One of the constructors in ParquetWriter ignores the enable
dictionary and validating flags.
new 4849155 Merge pull request #210 from wesleypeck/fixwriter
new 402e96d Wrong merge
new 4daff70 group parquet-format version in one property
new 8af5a22 Fix Binary.equals().
new 0b3400a Merge pull request #215 from Parquet/binary_equals
new 31aaa53 Merge pull request #213 from
aniket486/parquet_format_pom_refactor
new f4d6e17 Merge branch 'master' into plumb_original_type
new 0334948 parquet-hive should ship and uber jar
new bbfa4ac Address comments on pull request
new 700c223 Merge pull request #220 from brockn/master
new 1028fb9 make pig, hadoop and log4j jars provided
new e54735a Merge branch 'master' into cleanup_dependencies
new 61af0b9 Merge pull request #221 from Parquet/cleanup_dependencies
new 014f583 [maven-release-plugin] prepare release parquet-1.2.6
new 9424018 [maven-release-plugin] prepare for next development iteration
new de81ee8 changelog for 1.2.5 and 1.2.6
new 6b5d2d1 fix bug: set raw data size to 0 after reset
new 8e1110b Merge pull request #222 from
Parquet/fix_dic_fallback_page_cutting
new f4ad9df refactor encoded values changes and test that resetDictionary
works
new e0c5ac8 Merge pull request #223 from Parquet/dictionary_reset
new ab7959d [maven-release-plugin] prepare release parquet-1.2.7
new f587471 [maven-release-plugin] prepare for next development iteration
new 493bb9f Changing read and write methods in ParquetInputSplit so that
they can deal with large schemas (avoiding use of writeUTF and readUTF which
are limited to 65536 characters).
new d2ccc72 Breaks parquet-hive up into several submodules, creating
infrastructure to handle various versions of Hive going forward.
new f18bc49 enable globing files for parquetTupleScheme, refactor unit
tests and remove binary test fixture
new d7c8467 Merge pull request #224 from dave2718/master
new d7994dc add changelog tool
new 60c6512 Updates Hive 0.12 compatability patch by adressing all
comments from Julien's review plus a few additional cleanups, specifically:
new 4d13df5 encapuslate getFooter into a separate method
new c31a6be Merge pull request #228 from
Parquet/glob_files_for_parquet_tuple_scheme
new c2499da [maven-release-plugin] prepare release parquet-1.2.8
new e1d335b [maven-release-plugin] prepare for next development iteration
new 842500e [maven-release-plugin] prepare release parquet-1.2.8
new 6cb038c [maven-release-plugin] prepare for next development iteration
new 3b4ae5e Merge branch 'master' of https://github.com/Parquet/parquet-mr
new b297c73 optimize chunk scan; fix compressed size
new 476b8ea Merge branch 'master' into plumb_original_type
new e1ce063 check if pig is loaded when writing pig metadata
new 7dfd436 format
new 7fa1b6a make cascading a provided dependency
new 3b829a2 refactor get codec logic to remove duplication in
DeprecatedParquetOutputFormat
new 70f29c7 add cascading dependency to scrooge, and add
cascading.version propertie in project pom
new 6fa653b Merge pull request #236 from
Parquet/make_cascading_a_provided_dependency
new 8b0d05c Merge pull request #229 from Parquet/changelog_tool
new 407a52d fix missing codec
new 7641feb Merge pull request #227 from brockn/master
new 716a030 remove lzo test and lzo dependency
new 0b61cd9 Merge branch
'not_write_pig_meta_data_only_when_pig_is_not_avaliable' into
handle_codec_not_found
new 5a04096 license header
new 491481e Merge pull request #235 from
Parquet/not_write_pig_meta_data_only_when_pig_is_not_avaliable
new f7b2cd7 make CodecConfig a factory
new 2e3a370 restore getCompression methods in ParquetOutputFormat for
compatibility
new 0810736 fix pom version caused by bad merge
new 3db0d58 Merge pull request #238 from Parquet/fix_version
new 8958626 Merge branch 'master' into handle_codec_not_found
new e879680 Merge pull request #237 from Parquet/handle_codec_not_found
new 92a47b2 Fix hive map and array inspectors with null containers
new a39ad4c fix loader cache
new ca01d15 make the cache use a SoftReference
new 090d542 Update CHANGES.md
new aca1d8b Merge pull request #234 from Parquet/optimize_chunk_scan
new c95cb21 Merge pull request #239 from Parquet/hive_fix_null_maps
new 760367b Update reference to 0.10 in Hive012Binding javadoc and remove
some trailing whitespace I noticed when while updating the javadoc.
new 54308f7 Merge pull request #241 from brockn/master
new c73754b use latest stable release of cascading: 2.5.1
new 3f75f0e Merge pull request #233 from fs111/master
new bb9d898 Merge pull request #240 from Parquet/fix_loader_cache
new 22282a9 upgrade elephant-bird version to 4.3
new 600e7c9 Merge pull request #242 from Parquet/upgrade_eb_to_4_3
new 7436d8f add source to parquet-hive-binding
new eb4966f [maven-release-plugin] prepare release parquet-1.2.9
new a6f140c [maven-release-plugin] prepare for next development iteration
new e5ed117 Update CHANGES.md
new 8e23c24 add parquet cascading integration documentation
new 9df136c fix typo
new 99798b5 fix grammar
new 955cd7e plural for records
new 847df8f fix changelog
new 59601d7 Update CHANGES.md
new a347481 improve changelog
new 17146c3 Merge branch 'master' into plumb_original_type
new e2d819c Loading correct pbClass to ProtoSchemaConverter
new 08a204d Depricated init override removed
new 83f0646 pom.xml version 1.2.10-SNAPSHOT
new 0517253 TestUtils refactoring
new c590038 Obsolete test removed
new 5bb9e8d integrate parquet format 2.0
new 314ac2b Merge pull request #245 from
Parquet/integrate_parquet_format_2
new 652b0fe Merge branch 'master' into plumb_original_type
new e36b2f0 implement error handler
new 8269a6f handle extra field in data
new 3d4513f add checkEnum
new 564f370 add tests, fix bug
new da4b7fd refactor
new bdf5d6b Merge pull request #187 from davidzchen/plumb_original_type
new f5eb89d Merge pull request #244 from Parquet/feature/error_handler
new e29c2df fix when field index is greater than zero
new e94b392 format
new cd00dc8 Merge pull request #247 from
Parquet/fix/detect_extra_field_when_index_is_not_start_from_zero
new aad047a [maven-release-plugin] prepare release parquet-1.2.10
new 5f13c8c [maven-release-plugin] prepare for next development iteration
new 0743b60 Update CHANGES.md
new 0a01dae Use ContextUtil in tests to avoid dependency on parts of new
MR API that are incompatible between MR1 and MR2.
new 0df24f0 Rename ParquetInputFormat#addInputPathRecursively to avoid
clash with non-static Hadoop 2 method of same name on FileInputFormat.
new ea9fd20 Fix syntax error in test that Pig 0.12 complains about.
new e83778a make summary files read in parallel; improve memory footprint
of metadata
new f21fb31 Merge pull request #248 from
tomwhite/hadoop-2-compatibility-fixes
new 884a5e5 Merge pull request #243 from Parquet/parquet_cascading_doc
new a34507d pretty_print_json_for_compatibility_checker
new 18012a0 Merge pull request #250 from
Parquet/pretty_print_json_for_compatibility_checker
new da066e7 [maven-release-plugin] prepare release parquet-1.2.11
new 313c300 [maven-release-plugin] prepare for next development iteration
new 392a801 refactor
new 0888bde adress comments
new 67a7a9d Add writer version flag to parquet and make initial changes
for supported parquet 2.0 encodings
new d185966 pom version fix
new cf9a367 Merge pull request #252 from Parquet/refactor_error_handler
new f2e7baa Resolves issue #251 by doing additional checks if Hive
returns "Unknown" as a version.
new 956ad07 Merge pull request #256 from brockn/master
new 4a18684 changes for code review comments - enum as params, shortname
for writerversion
new f61331e In HIVE-5783 we will need a bundle jar to depend on that does
not include the Hive Serde since Hive trunk will contain the Hive Serde.
new a68c8fc Merge pull request #254 from Parquet/parquet_2.0_writer
new c817785 delta int bin pack
new d617084 formatting and license header
new 290385c format
new 74269e4 Merge pull request #253 from Parquet/delta_int
new 978e396 ProtoSchemaConverterUnitTest
new ffcc0b8 Merge pull request #257 from brockn/master
new 2737282 optimize consecutive row groups scans
new a4aef0d Initial commit
new 861016b Removing hadoop-core dependency conflict
new dba65be tests for Input and Output Formats
new 16b2f73 ProtoSchemaConverter Code Style
new 1394236 CodeStyle
new c1b6161 add delta length byte arrays and delta byte arrays encodings
new 5051acc fix minor typo in Encoding reader
new 017d088 minor javadoc changes
new 82b889c Merge pull request #1 from Parquet/master
new 1f75813 junit test for enum schema conversion
new 51ca71a remove old package info
new 52ffcfe remove commented code
new f2e607e add unit test
new 7def49c Adds parquet-jackson module to jackson-dependent modules
new 3013b9f Merge pull request #249 from Parquet/metadata_opt
new 124f2ed Merge branch 'master' into optimize_scan
new 3c91e46 refactor dictionary page handling
new dc7addc update with correct junit imports
new 30adb12 Adds small comment
new 9af4125 turn on parquet 2.0 flags
new e91cda9 Merge pull request #259 from Parquet/delta_strings
new 2b80e47 Merge branch 'master' into add-parquet-jackson-module
new cc8375c Merge pull request #258 from Parquet/optimize_scan
new 1e69167 Renames jackson.shade.prefix property into shade.prefix
new 8926033 Replaces org.codehaus.jackson groupId with corresponding
maven property
new 1ef3e9f Adds README with some explanations
new ee6d882 Renames jackson.shade.prefix property to shade.prefix (part2)
new 87864cb [maven-release-plugin] prepare release parquet-1.3.0
new a609147 [maven-release-plugin] prepare for next development iteration
new f7a9023 correct byte[] storage
new 5997bf5 #projection test
new 96f2300 #projection test - fix - cannot use inner class as mapper
new 985002e Code cleanup
new b273684 ConverterTest
new 99b7e52 new root directory
new 94b2ec0 delete .idea directory
new a717bbf merge
new d708c7d parquet-protobuf added to root pom.xml
new 919db0b Consistent naming protoXYZ
new c8188f3 pom - latest version
new 1f4a9db Code cleanup
new c7c39c3 Repeated Messages test
new 47cd572 Method ProtoParquetInputFormat.setRequestedProjection
signature
new 565638f refactor
new 1d1dd2f 1. refactor: maket ThriftSchemaConverter pluggable, can use
ThriftStructConverter or ScroogeStructConvert to convert class to ThriftType 2.
support scrooge read projection pushdown 3. add scroogeReadSupport
new ebc87de format
new 0fb0173 merge master
new b9e272a fix test
new 36c3b66 format
new 7c0d290 Code cleanup
new 63b710d Code cleanup - Enum comparsions
new 8ed45d0 Unnecessary unboxing
new 31e4b06 Url to main parquet repo
new e4e9fc2 Update CHANGES.md
new 0261cd6 upgrade parquet-mr to elephant-bird 4.4
new 622a400 handler only handle ignored field, exception during will be
thrown as SkippableException
new 2e43df5 fixes #265: add semver validation checks to non-bundle builds
new 7dac815 Merge pull request #266 from aniket486/upgrade_eb_4.4
new 79cc35d Merge pull request #267 from
Parquet/handler_only_handle_ignored_fields
new 5f57e46 bump maven-enforcer to 1.3.1 and remove some xml cruft
new 9199f3e [maven-release-plugin] prepare release parquet-1.3.1
new a906f0f [maven-release-plugin] prepare for next development iteration
new 954f39b new ElephantBird (4.3) + correct dependencies.
new b752260 ElephantBird 4.4 + hadoop client dependency
new 063edb4 Merge pull request #260 from
laurentgo/add-parquet-jackson-module
new 55ebcac Bumps parquet-jackson parent version
new 283293f Merge pull request #269 from
laurentgo/fix-parquet-jackson-parent-version
new da96420 Merge branch 'master' of github.com:Parquet/parquet-mr into
add_semver_checks
new 4dae164 unused method in TestUtils
new bc610e5 pom version 1.3.2-SNAPSHOT
new 880da33 ignore jackson packaging changes w.r.t semver
new 3830a15 add maven central as a repo to work around Travis build
issues with semver
new c1e86d8 remove snapshots=false from maven central xml
new 81ab426 Make package java.parquet.proto.converters (mostly) package
protected
new 2207cb9 switches on enums
new 0991475 Code style - small fixes
new f232e77 Make ParquetInputSplit extend FileSplit
new 5c6876a Revert "Make ParquetInputSplit extend FileSplit"
new af880ec Make ParquetInputSplit extend FileSplit
new 6664165 fix MapredParquetInputFormat exception issue caused by
ParquetInputSplit extending FileSplit
new c1298b7 Force <previousVersion>
new 8c8cbde Merge pull request #268 from Parquet/add_semver_checks
new 46b1ad0 fix bug: when enum index being written is the last index
defined in the Enum, a DecodingSchemaMismatchException is thrown. maintain enum
loopup table in EnumType
new 40f9b24 name fix
new ff62194 Merge pull request #271 from
Parquet/fix_bug_enum_last_value_exception
new b2184b8 add 1.3.1
new 8fb0b02 Update CHANGES.md
new 0dfb067 [maven-release-plugin] prepare release parquet-1.3.2
new f012db0 [maven-release-plugin] prepare for next development iteration
new 9bdaff9 Add code of conduct to Readme.md
new ac8968e prettify a few lines
new 471a693 1.3.2
new 81f33a6 Merge remote branch 'upstream/master'
new c00409a exclude ParquetInputSplit from semver check which seems to
have an issue with inherited method check
new 21faa3d Merge pull request #270 from ledbit/master
new a8be812 Readme.md - mark Protobuf support as in dev
new a2691a7 Exception message
new 6763f71 storage of repeated fields without extra level
new bbacdf0 storage of repeated fields without extra level - missing
protobuffer
new b25de98 style: junit.framework to org.junit
new da17462 Matching parquet and pbfields by index
new 3c0ab7a List cannot be empty
new 942cfe2 Dictionary enum conversion
new d00eb4e Merge branch 'master' of github.com:Parquet/parquet-mr into
junit_framework_to_org
new e4329cd move from junit3 to junit4
new 6edfa7e ProtoWriteSupport unit tests
new 8cc4cec New ProtoWriteSupport
new 496e3fd Scalar Converters are part of Message converter
new 5b1b79c javadoc
new 02f7707 ProtoMessageConverter case
new 2d9cf95 make setup calls static in tests
new b929d19 Merge pull request #280 from aniket486/junit_framework_to_org
new c8b7ba8 Merge remote branch 'upstream/master' into protobuf
new 5ffaba9 Maven shade plugin removed
new b1a6774 version 1.3.3-SNAPSHOT + shade plugin
new 024d5ab build fix - deleted package
new 8ecb0b2 first use current thread's classloader to load a class, if
current thread does not have a classloader, use the class's current classloader
to load a class. This will make sure a class not packaged in parquet but on
classpath loaded properly. Otherwise, for example, if you set your own
ReadSupport class to the Configuration object and expect it to be loaded by
ParquetInputFormat, it will fail and throw ClassNotFoundException.
new a1b7a31 use utility method from Configuration class to load class to
avoid ClassNotFoundException
new 83bb4b8 Added ParquetWriter() that takes an instance of Hadoop's
Configuration.
new f2f8e42 Fix to read a new avro schema...
new be43f88 Make setting requested projection and avro schema more
independent, so that you only need to set the Avro schema if it is different to
the writer's schema.
new 01bba92 Support promotion of int, long and float to wider types.
new e29d26b Use a default Avro read schema when none specified in
Parquet-Avro.
new ab54b70 Add tests for reading Parquet files using the default Avro
schema.
new 0185b49 Minor changes following Julien's review
new aadaae5 Revert change making field final that failed compatibility
test.
new 644bf00 Merge pull request #282 from tomwhite/avro-default-read-schema
new 3d7d9ad Merge pull request #292 from esammer/master
new 137b1e2 Merge pull request #289 from allanyan/master
new 68b5314 better error messages, create ParquetScroogeInputFormat class
new 045343d Merge remote-tracking branch 'upstream/master' into protobuf
new 38241cc Ports HIVE-5783 to the parquet-hive module so that patches
can be ported between the two code bases with ease. Note that the code base in
Hive itself should be considered the golden copy and any changes made there and
then ported to the parquet-hive module.
new 083c513 Convert ParquetHiveSerDe back to SerDe interface to support
Hive 0.10
new 1be4d6c bugfix: reorder fields in thrift struct caused writting
nulls. fixed it by keeping track of which fields are being written in each
level, and only write nulls when current level is finished in MessageColumnIO
new 02f50f7 rename var
new 94d703c Fill in default values for new fields in the read schema that
were not in the write schema.
new 0d111b1 remove fieldCount from marker
new 5dccd0c format
new 6496bcc Merge pull request #298 from
Parquet/bugfix_reorder_thrift_fields_causing_writting_nulls
new 76bbf4a [CASCADING] Provide the sink implementation in order to write
some parquet files with ParquetTupleScheme
new cc59a40 Don't deep copy immutable primitive types.
new 808de5d Support field renaming for Avro read schemas, by means of
field aliases.
new de7ae6b Add explicit blank namespaces to account for change in
AVRO-1295 in Avro 1.7.5.
new 3151b2f Merge pull request #299 from
tomwhite/avro-fill-in-default-values
new 29fe0e0 Merge pull request #303 from tomwhite/avro-read-schema-aliases
new cad7f56 Update poms to use thrift.exectuable property.
new c48e8c1 HIVE-6456 - Implement Parquet schema evolution
new 555837a Support field renaming for Avro read schemas, by means of
field aliases.
new 7593e65 Add explicit blank namespaces to account for change in
AVRO-1295 in Avro 1.7.5.
new 8102836 Merge the parquet-tools project into parquet-mr.
new 8cc8bdc Merge the parquet-tools project into parquet-mr.
new 588f868 Merge branch 'merge_parquet_tools' of
github.com:wesleypeck/parquet-mr into merge_parquet_tools
new 712e6d7 fix compile error in previous commit
new 7b0778c Merge pull request #297 from brockn/master
new ed08077 Don't fail if no default value specified for a new value in
the read schema.
new e237fc4 Don't shade Jackson since Avro exposes Jackson classes in its
public API for representing default values for fields.
new c7e892c merge master
new b07b160 Merge pull request #262 from Parquet/scrooge_schema_converter
new 70eada4 NULL tuples cause NPE when writing
new 000659a Merge pull request #1 from jalkjaer/cascading_sink
new 509e268 Better writing of a loop
new 7043a64 Initial int96 implementation.
new 77a355a Extending example and group classes for int96.
new 34b90d7 Removing Int96 class, using Binary instead.
new 56387e3 Remove int96 references from RecordConsumer and Converters.
new 6b2eef9 Delegate fixed and int96 types to convertBINARY.
new d7c7395 Merge Fixed dictionary with Binary dictionary.
new 3fc099f Factoring out common Binary impl in dictionary writer.
new 2403257 Use toStringUsingUTF8 to fix tests.
new af2380f Add NanoTime to example.
new a5d2de1 Add avro constructors with Configuration for #295.
new 603c0dc Fix avro schema conv for arrays of optional type for #312.
new 5e74bbe Add Configuration constructor in thrift writer for #295.
new 8cc3e29 Merge pull request #313 from rdblue/295-add-conf
new 132f75d Merge pull request #293 from rdblue/int96-support
new d356578 Merge pull request #264 from lukasnalezenec/protobuf
new 6063921 Merge pull request #285 from mickaellcr/cascading_sink
new f93c9cf Update cascading doc with Scrooge projection down.
new e392359 Merge pull request #316 from rdblue/thrift-prefix
new 2d5563b Merge pull request #311 from
tomwhite/avro-null-default-values-bug
new b722e7b Merge pull request #314 from
rdblue/312-fix-avro-array-of-optional
new 3cfea0a Merge pull request #310 from wesleypeck/merge_parquet_tools
new deb5e5d oauth based authentication; fix grep change
new 459b29b Merge pull request #319 from Parquet/fix_changelog
new a08d257 Spelling fix
new eb77222 Merge pull request #320 from posix4e/master
new 9899e5b fix filesystem resolution
new 0b5116a Merge pull request #329 from
Parquet/fix_file_system_resolution
new 1920abc compress schemas in input splits
new d0e548f a bit of jar size optimization
new 4246d18 close gzip stream in finally
new 9fdafc0 Merge pull request #333 from Parquet/compress_schemas_in_split
new 737a5d5 issue #290, hive map conversion to parquet schema
new 3d4311f remove originalType check for typeEquals of GroupType and add
tests for HiveSchemaConverter
new ba94119 protobuf dependency version changed from 2.4.1 to 2.5.0
new ee00e61 protobuf dependency version changed from 2.4.1 to 2.5.0 -
commit fix
new 05dea98 issue #324, move ParquetStringInspector to
org.apache.hadoop.hive.serde2.objectinspector.primitive package
new 621cf4e Added statistics to Parquet pages and rowGroups
new 860e123 remove originalType check for typeEquals of GroupType and add
tests for HiveSchemaConverter
new 5ba0ff1 Merge branch 'master' of github.com:tongjiechen/parquet-mr
new 7b5e2ec Addresses some initial comments. Javadocs, removed StatsHelper
new 73d6617 [maven-release-plugin] prepare release parquet-1.4.0
new db13f19 [maven-release-plugin] prepare for next development iteration
new 670c940 Merge branch 'master' of github.com:egonina/parquet-mr into
stats
new 594c47e Added licence to new files
new 616f778 Update CHANGES.md
new 44f31c5 Update CHANGES.md
new 125529b issue #324, move ParquetStringInspector to
org.apache.hadoop.hive.serde2.objectinspector.primitive package
new e8d9763 Merge branch 'master' of
https://github.com/Parquet/parquet-mr into issue324
new 9f43945 Refactored the *Statistics classes to reuse more code. Added
Binary compareTo methods
new 7345536 Merge branch 'issue324' of github.com:tongjiechen/parquet-mr
into issue324
new 07c5472 Merge branch 'issue324' of github.com:tongjiechen/parquet-mr
into issue324
new 82ec584 issue #324 remove additional tab
new 47ff4ab Merge branch 'issue324' of github.com:tongjiechen/parquet-mr
into issue324
new 156b186 remove duplicate code
new c54cad5 compress kv pairs in ParquetInputSplits
new 5207422 Merge pull request #342 from
Parquet/compress_kv_pairs_in_split
new 253eb6a select * from parquet hive table containing map columns runs
into exception. Issue #341.
new e1b4800 set cascading version to 2.5.3
new 6aeaa52 Merge pull request #345 from epishkin/cascading_2.5.3
new f9a8676 stop using strings and b64 for compressed input splits
new ce2301e Merge pull request #346 from
Parquet/compress_kv_pairs_in_split
new 5b8af1f set reading length in ThriftBytesWriteSupport to avoid
potential OOM caused by corrupted data
new f5edd0a Merge pull request #347 from
Parquet/check_read_length_avoid_oom
new 3f5de76 Merge pull request #344 from szehon/master
new 30810ff fix header bug
new 05327c1 Added hashCode() method for Statistics class
new 16d38e2 Fix bug #350, fixed length argument out of order.
new 27f71a1 [maven-release-plugin] prepare release parquet-1.4.1
new bad0012 [maven-release-plugin] prepare for next development iteration
new 93359c0 Added length check for comparing two byte arrays
new f98de75 adding comments
new 41df190 Merge pull request #349 from Parquet/null_header
new b8149e9 ParquetThriftStorer
new a13ae41 cleanup
new 0943978 headers
new 67c1e11 use own test fixtures
new 6417bae 1. upgrade scrooge dep to 3.12.1 2. fix bug when an enum
field is optional, scroogeSchemaConverter would fail
new ddca03c cleanup log messages in tests
new de0bfe3 cleanup log messages in tests
new 9ef1be6 cleanup log messages in tests
new f5c3151 Expose values in SimpleRecord
new f8877f1 cleanup log messages for default codec
new 110fe21 fix test runtime dep missing from pig
new d093f49 reverse codec changes
new 3fad816 Fix output bug during parquet-dump command
new 79a4ac8 Merge pull request #352 from Parquet/ParquetThriftStorer
new 5d06526 generate splits by min max size, and align to HDFS block when
possible
new 796b7dd do not call schema converter to generate projected schema
when the projectionFilterStrubg or projectionSchemaStr is specified
new 3321b67 fix enum to be upper case
new f4a0900 remove unused code
new b55eea0 make ParquetFileWriter throw IOException in invalid state case
new eeae127 Merge pull request #367 from Parquet/ioexception
new 6b7bc54 Merge pull request #366 from
Parquet/avoid_convert_thrift_scrooge_class_when_projection_is_not_specified
new 0a96b2c local variable of hdfsBlock
new dd8c32a fix missing space
new 23958b8 check maxSplit size must be greater or equal to minSplitSize
new 83493c5 maxSplitSize should always be positive
new 2056bfa separate out getParquetInputSplit method in the SplitInfo
class, reduce LOC in the generateSplit method
new fca4cc9 move parseMessageType out of the loop
new 7845cc7 1. remove unused readSupportClass parameter from
generateSplit method; 2. double check split min max to be postive in the
getSplits method; 3. explicit import java.util.xx in test
new 9814332 add more tests so the hdfsSize is not multiple of rowGroup
size
new ac816d9 min split size default to 0
new 83e34be add non-negative check in generateSplits method
new a85b7fd better message
new 4c870b0 Merge pull request #362 from nealsid/master
new 8e348e6 create a getStartingPos in ColumnChunkMetaData
new 00d631c make SplitInfo contain the hdfsBlock
new 9705f49 1. check row groups are sorted; 2. add getStartingPos for
BlockMetadata, which returns the startingPos for the first Column
new 70707e4 use getStartingPos for BlockMetadata, which returns the
startingPos for the first Column
new 05c3e27 ensure SimpleRecord#getValues() is unmodifiable
new 9f672d6 use mid point of a row group to decide to create a split or
not
new bba221d format
new 72dbbdc Merge pull request #353 from
Parquet/bugfix_failed_convert_to_scrooge_struct_when_enum_is_optional
new ac2b15e change name to checkBelongingToANewHDFSBlock
new 93d11c5 Merge pull request #365 from
Parquet/generate_splits_by_min_max_size
new 8aeea14 Merge pull request #335 from tongjiechen/master
new c0b9622 Merge pull request #359 from mping/patch-1
new c9445a3 [maven-release-plugin] prepare release parquet-1.4.2
new 10a0af6 [maven-release-plugin] prepare for next development iteration
new 76d05fa Update CHANGES.md
new 7640224 Adding back the Page() and writePage() methods for
backward-compatibility The methods now pass an empty Stats object downstream
new 3e90b41 Merge branch 'master' of
https://github.com/Parquet/parquet-mr into stats
new 78491a4 adding 1.4.1 as previous version
new f6a2218 configure semver to enforce semantic versioning
new 9a38aec fix metadata concurency problem
new 6aed528 Merge pull request #381 from Parquet/fix_concurency_problem
new 3f25ad9 [maven-release-plugin] prepare release parquet-1.4.3
new 00e794a [maven-release-plugin] prepare for next development iteration
new 0e334ca Use parameterized to test with and without dictionary.
new 5d1a66a protobuf 2.5 instalation script for Travis
new 636457c protobuf 2.5 instalation script for Travis - build fix
new af74b79 protobuf 2.5 instalation script for Travis - pushd/popd
new 5106593 protobuf 2.5 instalation script for Travis - fix
new cb3e514 protobuf 2.5 instalation script for Travis - remove make check
new 346b387 Merge pull request #337 from tongjiechen/issue324
new 57b0131 Merge pull request #336 from lukasnalezenec/protobuf
new 50701e7 Merge branch 'master' into tweak_semver
new 163bf6b Add support for DECIMAL type annotation.
new a1d7260 Fix primitive type equality for fixed with different lengths.
new 3af02db Add more tests for type builders.
new 63ffdce Add test for decimal with unsupported primitive types.
new 299e0ca Add Types builder API documentation.
new 73d7558 Simplify Types API by moving repetition.
new 5c80705 Update documentation and formatting.
new 9ef22e6 Fix maximum precision calculation, account for sign bit.
new 86501c2 Add INT32 and INT64 as supported types for DECIMAL.
new acaac8b Implement code review changes.
new c825e89 Remove unchecked casts from Types.Builder.
new 0189ff1 Fix more code review finds.
new db31a49 upgrade semver and add exclude for shaded stuff
new 638c044 update version
new 9f75dd1 Merge pull request #355 from rdblue/decimal
new 0c740e0 remove exclude for Split
new bcd2ec5 remove unnecessary version number in parquet-scrooge
new d08313c add release 1.4.3 to changelog
new 7d335d8 Merge pull request #378 from Parquet/tweak_semver
new 96b94e1 Merge pull request #351 from rdblue/350-fix-int96-dictionary
new ff830a9 previous version to 1.4.2
new 2678e39 Merge branch 'master' of
https://github.com/Parquet/parquet-mr into stats
new c98d8af adding back the parquet-hadoop methods that don't have
statistics parameters, for backward comp
new 041146e Fixed hadoop WriteSupportClass loading
new 882740f return NullCounter when read via Cascading, but not within a
cluster side job
new 05b4e7c Merge pull request #338 from egonina/stats
new cc28822 Added padding for columns not found in file schema
new 70bb0ea fixes for converting from bytes, toString() methods, writing
stats to Footer, unit testing for MAX/MIN_VALUE
new 4d42afb Merge pull request #392 from egonina/stats
new 10dc714 Added test for null padding
new d5a8f9f Merge pull request #389 from dcw-netflix/pad-schema
new 24076a4 Fixed issue with column pruning when using requested schema
new fb7dba1 Updated test and remove shortcut return statement in loader
new b70509d Merge pull request #397 from
dcw-netflix/requested-schema-pruning
new 8091a1b fix null stats
new 7e4346b merging with fix_null_stats branch
new e4991ff Merge branch 'master' of
https://github.com/Parquet/parquet-mr into stats
new 4fee0a7 Bug fix - resetting stats after writing page. Fixed unit test
to test reading footer
new 54f9b10 Cleaning up + testing small & large values
new fd8d18f Merge pull request #399 from egonina/stats
new 7997745 [maven-release-plugin] prepare release parquet-1.5.0
new b2f0fae [maven-release-plugin] prepare for next development iteration
new 01d5157 Update CHANGES.md
new a05afe2 Merge pull request #387 from ambiata/fix-writeclass
new ee0b98c Merge pull request #388 from fs111/master
new b767ac4 Update README.md
new 859b6b4 PARQUET-3: tool to merge pull requests based on Spark
new 9ad5485 PARQUET-2: Adding Type Persuasion for Primitive Types
new 4ad7303 Minor fix
new 9c2fab4 PARQUET-6: Create documentation on how to contribute.
new 2d8ebdb PARQUET-9: Filtering records across multiple blocks
new 5dffe35 PARQUET-4: Use LRU caching for footers in ParquetInputFormat.
new f6c02e2 PARQUET-21: Fix reference to 'github-apache' in dev docs
new fb01048 PARQUET-18: Fix all-null value pages with dict encoding.
new f284238 PARQUET-22: Backport of HIVE-6938 adding rename support for
parquet
new 4a07b3f PARQUET-25. Pushdown predicates only work with hardcoded
arguments.
new 17864df Column index access support
new fc2c29d PARQUET-19: Fix NPE when an empty file is included in a Hive
query that uses CombineHiveInputFormat
new ad32bf0 Add a unified and optionally more constrained API for
expressing filters on columns
new b0e26ee Only call put() when needed in
SchemaCompatibilityValidator#validateColumn()
new 21d871b PARQUET-56: Added an accessor for the Long column type.
new 0793e49 PARQUET-57 - Update dev README to clarify two points
new 0148455 PARQUET-13: The `-d` option for `parquet-schema` shouldn't
have optional argument
new 3a396d3 PARQUET-59: Fix parquet-scrooge test on hadoop-2.
new b86b01b [maven-release-plugin] prepare release parquet-1.6.0rc1
new 08a3c6a [maven-release-plugin] prepare for next development iteration
new 0d497c4 PARQUET-73: Add support for FilterPredicates to cascading
schemes
new 7af955a PARQUET-50: Re-Enable the semver enforcer
new 7b415fa Parquet-70: Fixed storing pig schema to udfcontext for non
projection case and moved...
new 45e5810 PARQUET-69: Committer doc
new 54bb983 PARQUET-62: Fix binary dictionary write bug.
new 792b149 PARQUET-67: mechanism to add extra metadata in the footer
new 84ebe4c PARQUET-66: Upcast blockSize to long to prevent integer
overflow.
new 8474f6d PARQUET-80: upgrade semver plugin version to 0.9.27
new d3cd97a PARQUET-75: Fixed string decode performance issue
new 7a10506 PARQUET-8: bump scrooge-maven-plugin version
new f8b06df do ProtocolEvents fixing only when there is required fields
missing in the requested schema
new 647b8a7 PARQUET-63: Enable dictionary encoding for FIXED.
new 5dafd12 PARQUET-84: Avoid reading rowgroup metadata in memory on the
client side.
new 5f39948 update scala 2.10
new 24119cc upgrade scalatest_version to depend on scala 2.10.4
new f637c44 PARQUET-87: Add API for projection pushdown on the cascading
scheme level
new fbe458f PARQUET-88: fix pre-version enforcement
new 8d878af PARQUET-24: enforce JIRA prefix
new 316b568 [maven-release-plugin] prepare release parquet-1.6.0rc2
new 501e8fe [maven-release-plugin] prepare for next development iteration
new 9cdcf3b PARQUET-94: Fix bug in ParquetScroogeScheme constructor,
minor cleanup
new 3dc223c PARQUET-92: Pig parallel control
new 0eb9637 PARQUET-89: Add hadoop-2 test profile for Travis CI.
new 59c58d0 PARQUET-82: Check page size is valid when writing.
new 0c4f13a PARQUET-101: fix meta data lookup when not using
task.side.metadata
new 3a082e8 PARQUET-90: integrate field ids in schema
new bf20abb PARQUET-96: fill out some missing methods on parquet.example
classes
new 0b17cbe PARQUET-104: Fix writing empty row group at the end of the
file
new da91299 PARQUET-64: Add new OriginalTypes in parquet-format 2.2.0.
new be1222e PARQUET-107: Add option to disable summary metadata.
new 31fb4df PARQUET-105: use mvn shade plugin to create uber jar, support
meta on a folder
new ccfca8f PARQUET-106: Relax InputSplit Protections
new a29815a PARQUET-123: Enable dictionary support in
AvroIndexedRecordConverter
new f1da5e9 PARQUET-121: Allow Parquet to build with Java 8
new 92e6d71 PARQUET-122: make task side metadata true by default
new 251a495 PARQUET-135: Input location is not getting set for the
getStatistics in ParquetLoader when using two different loaders within a Pig
script.
new d105819 PARQUET-132: Add type parameter to AvroParquetInputFormat.
new 3aa6f11 PARQUET-114: Sample NanoTime class serializes and
deserializes Timestamp incorrectly
new ad06e61 PARQUET-52: refactor fallback mechanism
new b5f6a3b PARQUET-140: Allow clients to control the GenericData
instance used to read Avro records
new ccc29e4 PARQUET-117: implement the new page format for Parquet 2.0
new b7a82a9 PARQUET-145 InternalParquetRecordReader.close() should not
throw an exception if initialization has failed
new 8e2ea92 PARQUET-150 Update merge script issue id matching.
new 23db4eb PARQUET-108: Parquet Memory Management in Java
new 52f3240 PARQUET-141: upgrade to scrooge 3.17.0, remove reflection
based field info inspection...
new d70fdbc PARQUET-168: Fixes parquet-tools command line option
description
new 4bf9be3 PARQUET-136: NPE thrown in StatisticsFilter when all values
in a string/binary column trunk are null
new 0751f97 PARQUET-174: Replaces AssertionError constructor introduced
in Java7
new d7dd228 PARQUET-133: Upgrade snappy-java to 1.1.1.6
new e505e1f PARQUET-124: normalize path checking to prevent mismatch
between URI and ...
new b4380f2 PARQUET-142: add path filter in ParquetReader
new 32a9c6d PARQUET-157: Divide by zero fix
new a635f21 Update Travis CI link in README.md.
new 3df3372 PARQUET-111: Updates for apache release
new 8041735 PARQUET-173: Fixes `StatisticsFilter` for `And` filter
predicate
new 668d031 PARQUET-181: Scrooge Write Support (take two)
new 05adc21 PARQUET-177: Added lower bound to memory manager resize
new ce65dfb PARQUET-139: Avoid reading footers when using task-side
metadata
new 807915b PARQUET-116: Pass a filter object to user defined predicate
in filter2 api
new f48bca0 PARQUET-164: Add warning when scaling row group sizes.
new 4f87e0f PARQUET-190: fix an inconsistent Javadoc comment of
ReadSupport.prepareForRead
new f1b5487 PARQUET-191: Fix map Type to Avro Schema conversion.
new c82f703 PARQUET-192: Fix map null encoding
new 36a02dc PARQUET-188: Change column ordering to match the field order.
new fa8957d PARQUET-187: Replace JavaConversions.asJavaList with
JavaConversions.seqAsJavaList
new d084ad2 PARQUET-160: avoid wasting 64K per empty buffer.
new ea81e9a PARQUET-186: Fix Precondition performance problem in
SnappyUtil.
new 998d650 PARQUET-134 patch - Support file write mode
new 2583494 PARQUET-162: ParquetThrift should throw when unrecognized
columns are passed to the column projection API
new 5851e6d PARQUET-197 : fix parquet-cascading not writing parquet
metadata file
new 2d1eaef PARQUET-202 Typo in the connection info in the pom prevents
publishing an RC
new b2623f1 [maven-release-plugin] prepare release parquet-1.6.0rc5
new a7155a8 [maven-release-plugin] prepare for next development iteration
new 12ee6b4 PARQUET-208: Revert PARQUET-197
new 3fc2854 PARQUET-193: Implement nested types compatibility rules in
Avro
new ba43142 [maven-release-plugin] prepare release parquet-1.6.0rc6
new cd89c88 [maven-release-plugin] prepare for next development iteration
new a0c77b6 PARQUET-111: Update headers in parquet-tools, remove NOTICE.
new 5acc6a5 PARQUET-97: make ProtoParquetReader#builder static
new 031a762 PARQUET-172: Add parquet-thrift binary tests.
new b58789c PARQUET-180: Update use of TBinaryProtocol#setReadLength.
new 77826fd PARQUET-215 Discard records with unrecognized union members
in the thrift write path
new 9ee3a16 PARQUET-217 Use simpler heuristic in MemoryManager
new 2e3c053 PARQUET-197 : Gen parquet metadata from cascading
new ec6f200 [maven-release-plugin] prepare release parquet-1.6.0rc7
new cb7f6a8 [maven-release-plugin] prepare for next development iteration
new fd3085e PARQUET-204: add parquet-schema directory support
new b8f5d89 PARQUET-189: Support building parquet with thrift 0.9.0
new 4fea3ea PARQUET-165: Add a new parquet-benchmark module
new 9a92f39 PARQUET-165: Update parquet version in the benchmark module
new 0ab0013 PARQUET-210: add JSON support for parquet-cat
new 4ed0bdf PARQUET-214: Fix Avro string regression.
new 27ba681 PARQUET-230: Add build instructions to README.
new bfb3145 PARQUET-220: Remove unnecessary warnings initializing
ParquetRecordReader
new ff7a486 Revert "PARQUET-220: Remove unnecessary warnings initializing
ParquetRecordReader"
new 4950ad8 PARQUET-242: Fix AvroReadSupport.setAvroDataSupplier.
new f272a6e PARQUET-234: Add ParquetInputSplit methods for compatibility.
new 920192a PARQUET-235: Fix parquet.metadata compatibility.
new b613629 PARQUET-239: Make AvroParquetReader#builder static.
new 828ff75 PARQUET-211: 1.6.0 release changes
new 4f66077 PARQUET-211: Set version to 1.6.0 for release.
new e101917 PARQUET-211: Set version for 1.7.0-incubating development.
new f28aa71 PARQUET-252 : Support nests container types for scrooge
support
new 720b988 Revert "PARQUET-252 : Support nests container types for
scrooge support"
new b10870e PARQUET-23: Rename to org.apache.parquet.
new 7c42398 PARQUET-211: Update version for 1.8.0 development.
new 4f7c704 PARQUET-245: Only run tests in Travis CI if build succeeds.
new 9d744f7 PARQUET-268: Downgrade scrooge-maven-plugin.
new 1be3878 PARQUET-270: Adds a legend for meta output to readme.md
new b287d35 PARQUET-271: Fixes parquet-tools java examples
new 9993450 PARQUET-227 Enforce that unions have only 1 set value,
tolerate bad records in read path
new 98f54c1 PARQUET-175 reading custom protobuf class
new 22c6d08 PARQUET-269: Restore scrooge-maven-plugin to version 3.17.0
new 7fc7998 PARQUET-229 Add a strict thrift projection API with backwards
compat support
new 890b387 PARQUET-252 : support nested container type for
parquet-scrooge
new b8aae90 PARQUET-272: Updates docs description to match data model
new 9500c77 PARQUET-276: Updates CONTRIBUTING file with new repo info
new c7d56cf PARQUET-273 : remove usage of ReflectiveOperationException to
support JAVA6
new e5d9c6c PARQUET-265: Update POM files for Parquet TLP.
new 7680fae PARQUET-254: Fixes exception message
new 136c5ff PARQUET-253: Fixes Javadoc of AvroSchemaConverter
new 1dbcdf2 PARQUET-274: Updates URLs to link against the apache user
instead of Parquet on github
new 60edcf9 PARQUET-278 : enforce non empty group on MessageType level
new a458e1a PARQUET-243: Add Avro reflect support
new 181affd PARQUET-164: Add a counter and increment when parquet memory
manager kicks in
new ded56ff PARQUET-287: Keep a least 1 column from union members when
projecting thrift unions
new 8769d0f PARQUET-262: Restore semver checks.
new dd92a9d PARQUET-223: Add builders for MAP and LIST types
new 213e952 [maven-release-plugin] prepare release parquet-1.8.0rc1
new 33a2202 [maven-release-plugin] prepare for next development iteration
new 4b5cda5 PARQUET-151: Skip writing _metadata file in case of no
footers since schema cannot be determined.
new d6f082b PARQUET-285: Implement 3-level lists in Avro
new 918609f PARQUET-286: Update String support to match upstream Avro.
new 2e62764 PARQUET-266: Add support for lists of primitives to Pig
schema converter
new 4590f14 PARQUET-246: fix incomplete state reset in
DeltaByteArrayWriter.reset()
new faf5421 PARQUET-263: Release changes from parquet-1.7.0 branch
new 5f48f19 PARQUET-309: remove unnecessary compile dependency on
parquet-generator
new 1c16068 PARQUET-264: Remove remaining references to parquet being an
incubator project
new ad44321 PARQUET-297: generate Version class using parquet-generator
new 079bcd0 PARQUET-297: Tests for PR 213 (Version generator)
new 29283b7 PARQUET-314: Fix broken equals implementations
new 89321a2 PARQUET-311: Fix NPE when debug logging metadata
new 412ab96 PARQUET-306: Add row group alignment
new 46448e9 PARQUET-201: Fix ValidTypeMap being overly strict with
respect to OriginalTypes
new 5c2ba72 PARQUET-284: Clean up ParquetMetadataConverter
new cb04562 PARQUET-248: Add ParquetWriter.Builder.
new 1f3e72f PARQUET-317: Fix writeMetadataFile crash when a relative root
path is used
new e6ee42e PARQUET-316: Fix the benchmark module
new e3b9502 PARQUET-251: Binary column statistics error when reuse byte[]
among rows
new 9fde653 PARQUET-320: Fix semver problems for parquet-hadoop.
new c7720ca PARQUET-325: Always use row group size when padding is 0.
new a747456 PARQUET-308: Add ParquetWriter#getDataSize accessor.
new 2f2c8b1 PARQUET-289: Allow ParquetReader.Builder subclasses.
new c334a1b PARQUET-290: Add data model to Avro reader builder
new 013b445 PARQUET-152: Add validation on Encoding.DELTA_BYTE_ARRAY to
allow FIX…
new f4e754e PARQUET-324: row count incorrect if data file has more than
2^31 rows
new 043fcde PARQUET-246: File recovery and work-arounds
new 4c7d752 PARQUET-329: Restore
ThriftReadSupport#THRIFT_COLUMN_FILTER_KEY
new 8f898da PARQUET-292: Update CHANGES.md for 1.8.0.
new 0fda28a [maven-release-plugin] prepare release apache-parquet-1.8.0
new abfe355 [maven-release-plugin] prepare for next development iteration
new fcd5682 PARQUET-279 : Check empty struct in compatibility checker
new be9f3cb PARQUET-331: Surface subprocess stderr in merge script
new 8a2c618 PARQUET-338: Fix pull request example in README
new f79c936 PARQUET-337 handle binary fields in set/map/list in
parquet-scrooge
new 8714dd0 PARQUET-336: Fix ArrayIndexOutOfBounds in
checkDeltaByteArrayProblem
new 8da9456 PARQUET-339: Add Alex Levenson to KEYS file
new 07cefb8 Update CHANGES for 1.8.1 release
new 4aba4da [maven-release-plugin] prepare release apache-parquet-1.8.1
new 1dd5cec [maven-release-plugin] prepare for next development iteration
new 83406b7 PARQUET-340: MemoryManager: max memory can be truncated
new 454fc36 PARQUET-342: Updates to be Java 6 compatible
new b86f68e PARQUET-346: Minor fixes for PARQUET-350, PARQUET-348,
PARQUET-346, PARQUET-345
new 2f956f4 PARQUET-341 improve write performance for wide schema sparse
data
new 3f36b7b PARQUET-362 - Fix parquet buffered writer being oversensitive
to union schema changes
new 01fbf81 PARQUET-343 Caching nulls on group node to improve write
performance on wide schema sparse data
new 2c90a9d PARQUET-356: Update LICENSE files for code from ElephantBird.
new 04f524d PARQUET-361: Add semver prerelease logic.
new 9962a0f PARQUET-335: Remove Avro check for MAP_KEY_VALUE.
new f203d80 PARQUET-363: Allow empty schema groups.
new d24ecb3 PARQUET-376: Tolerate square brackets in PR titles
new 415761d Revert "PARQUET-376: Tolerate square brackets in PR titles"
new 66e39fc PARQUET-375: Update current release version in README.md
new 0637e2f PARQUET-360: Handle all map key types with cat tool's json
dump
new c381968 PARQUET-355: Add Statistics Test for Parquet Columns
new b1ea059 PARQUET-381: Add feature to merge metadata (summary) files,
and control which files are generated
new 5294c64 PARQUET-373: Fix flaky MemoryManager tests.
new 5a45ae3 PARQUET-241: Fix ParquetInputFormat.getFooters() order
new 6b605a4 PARQUET-77: ByteBuffer use in read and write paths
new 440882c PARQUET-364: Fix compatibility for Avro lists of lists.
new 0912987 PARQUET-380: Fix build when using thrift 0.9.0.
new efafa61 PARQUET-378: Add thoroughly parquet test encodings
new 6308304 PARQUET-396: Extend ParquetReader.Builder<T>
new f4918bb PARQUET-398: Updates dev/COMMITTERS.md
new e32aa6f PARQUET-398: Add 'spena' information to dev/COMMITTERS.md
new 14097c6 PARQUET-387: Improve NPE message when avro arrays contain
null.
new f2615d9 PARQUET-349: VersionParser does not handle versions missing
'build' section
new dcd1c33 PARQUET-352: Add object model property to file footers.
new a24d624 PARQUET-305: Update logging to SLF4J.
new 5632640 PARQUET-99: Add page size check properties
new b45c4bd PARQUET-382: Add methods to append encoded data to files.
new 4916903 PARQUET-353: Release compression resources.
new fa7588c PARQUET-334: UT test failure with Pig 0.15
new 367fe13 PARQUET-318: Remove unnecessary object mapper
new fbb2c9e PARQUET-404: Replace [email protected] for HTTPS URL on
dev/README.md to avoid permission issues
new 368588b PARQUET-413: Fix Java 8 test failure.
new 37f72dc PARQUET-212: Implement LIST read compatibility rules in Thrift
new 84b2b74 PARQUET-421: Fix mismatch of javadoc names and method
parameters in m...
new 30ee10d PARQUET-422: Fix a potential bug in MessageTypeParser where
we ignore…
new c38386d PARQUET-393: Update to parquet-format 2.3.1.
new af9fd05 PARQUET-432: Complete a todo for method
ColumnDescriptor.compareTo()
new 5769479 PARQUET-480: Update for Cascading 3.0
new 63d5ae7 PARQUET-495: Fix mismatches in Types class comments
new 06a4689 PARQUET-410: Fix hanging subprocess call in merge script.
new 0a711eb PARQUET-415: Fix ByteBuffer Binary serialization.
new a4acf53 PARQUET-509: Fix args passed to string format calls
new c26fa78 PARQUET-385 PARQUET-379: Fixes strict schema merging
new 6c9ca4d PARQUET-430: Change to use Locale parameterized version of
String.toUpperCase()/toLowerCase
new 944291b PARQUET-431: Make ParquetOutputFormat.memoryManager volatile
new c44f982 PARQUET-529: Avoid evoking job.toString() in ParquetLoader
new fb46b94 PARQUET-397: Implement Pig predicate pushdown
new 1f91c79 PARQUET-528: Fix flush() for RecordConsumer and
implementations
new 4b1ff8f PARQUET-384: Add dictionary filtering.
new e9928c9 PARQUET-571: Fix potential leak in ParquetFileReader.close()
new d402148 PARQUET-581: Fix two instances of the conflation of the min
and max row
new ac62c1c PARQUET-580: Switch int[] initialization in IntList to be lazy
new dc08bb8 PARQUET-584 show proper command usage when there's no
arguments
new 82b8ecc PARQUET-484: Warn when Decimal is stored as INT64 while could
be stored as INT32
new 6b24a1d PARQUET-358: Add support for Avro's logical types API.
new 36ce032 PARQUET-585: Slowly ramp up sizes of int[]s in IntList to
keep sizes small when data sets are small
new 7419443 PARQUET-327. Show statistics in the dump output.
new 8bcfe6c PARQUET-225: Add support for INT64 delta encoding.
new 3dd2210 PARQUET-548: Add EncodingStats.
new 2f22533 PARQUET-569: Separate metadata filtering for ranges and
offsets.
new 39a3cd0 PARQUET-560: Synchronize writes to the finishCalled variable
new c3f3830 PARQUET-372: Do not write stats larger than 4k.
new da69d4b PARQUET-367: "parquet-cat -j" doesn't show all records.
new 1f47025 PARQUET-544: Add closed flag to allow for closeable contract
adherence
new 9c40a7b PARQUET-645: Fix null handling in DictionaryFilter.
new 7f8e952 PARQUET-642: Improve performance of ByteBuffer based read /
write paths
new bd0b5af PARQUET-612: Add compression codec to FileEncodingsIT.
new e036d60 PARQUET-654: Add option to disable record-level filtering.
new 02ce9b0 PARQUET-663: Update README.md
new 42662f8 PARQUET-389: Support predicate push down on missing columns.
new a421d95 PARQUET-540: Fix Cascading 3 build thrift and SLF4J.
new 626014e PARQUET-651: Improve Avro's isElementType check.
new 6a62646 PARQUET-543: Remove unused boundedint package.
new 60b6d5a PARQUET-667: Update committers lists to point to apache
website
new 5c85b8d PARQUET-511: Integer overflow when counting values in column.
new ea402be PARQUET-668 - Provide option to disable auto crop feature in
dump
new 76a2ac8 PARQUET-669: allow reading footers from provided file listing
and streams
new b301d12 PARQUET-667: Add back + update committers table
new 30aa910 PARQUET-601: Add support to configure the encoding used by
ValueWriters
new c8d78b2 PARQUET-146: Move Parquet to Java 7
new 898f3d0 PARQUET-400: Replace CompatibilityUtil with
SeekableInputStream.
new 255f108 PARQUET-460: merge multi parquet files to one file
new 6dad1e3 PARQUET-696: fix travis build. Broken because google code
shut down
new 044de16 PARQUET-623: Fix DeltaByteArrayReader#skip.
new e54ca61 PARQUET-660: Ignore extension fields in protobuf messages.
new b59be86 PARQUET-674: Add InputFile abstraction for openable files.
new 07a42d3 PARQUET-726: Increase max difference of
testMemoryManagerUpperLimit to 10%
new e6da0f6 PARQUET-685 - Deprecated ParquetInputSplit constructor passes
paramet…
new a0e6cc3 PARQUET-727: Ensure correct version of thrift is used
new 06768d9 PARQUET-740: Introduce editorconfig
new de99127 PARQUET-686: Do not return min/max for the wrong order.
new 59ec4f0 PARQUET-743: Fix DictionaryFilter when compressed
dictionaries are reused.
new 31d0d4d PARQUET-392: Update CHANGES.md for 1.9.0.
new 2a99abf [maven-release-plugin] prepare release apache-parquet-1.9.0
new 27b9934 [maven-release-plugin] prepare for next development iteration
new 0116aa7 PARQUET-392: Fix svn log message in source-release.sh.
new 1058b7d PARQUET-392: Fix staging instructions in prepare-release.sh.
new ece4b70 PARQUET-751: Add setRequestedSchema to ParquetFileReader.
new 38262e2 [maven-release-plugin] prepare release apache-parquet-1.9.0
new aa416b5 [maven-release-plugin] prepare for next development iteration
new df9d8e4 PARQUET-423: Replace old Log class with SLF4J Logging
new e5cd652 PARQUET-753: Fixed GroupType.union() to handle original type
new 0ed977a PARQUET-768: Add Uwe L. Korn to KEYS
new cf99160 PARQUET-755: create parquet-arrow module with schema converter
new 4453aa3 PARQUET-765 - Upgrade Avro to 1.8.1
new 09d28fe PARQUET-783: Close the underlying stream when an
H2SeekableInputStream is closed
new 7987a54 PARQUET-786: 'java -jar', not 'java jar' closes #377, #374
new 4fd34e6 PARQUET-220: Unnecessary warning in
ParquetRecordReader.initialize
new 98c2769 PARQUET-321: Default maximum block padding to 8MB.
new 71cff7c PARQUET-791: Add missing column support for
UserDefinedPredicate
new 89e0607 PARQUET-801: Allow UserDefinedPredicates in DictionaryFilter
new f68dbc3 PARQUET-825: Static analyzer findings (NPEs, resource leaks)
new 6fb6085 PARQUET-822: Upgrade java dependencies
new 3634821 PARQUET-806: Parquet-tools silently suppresses error messages
new 2fd62ee PARQUET-772: Fix locale-specific test failures.
new 70f2881 PARQUET-665 Adds support for proto3
new a703ee7 PARQUET-969: Update parquet-tools to convert Decimal datatype
to BigD…
new fd7cfed PARQUET-196: parquet-tools command for row count & size
new 1de41ef PARQUET-852: Slowly ramp up sizes of byte[] in
ByteBasedBitPackingEncoder
new 9491d7a PARQUET-990 More detailed error messages in footer parsing
new 9d58b6a Parquet-884: Add support for Decimal datatype to Parquet-Pig
record reader
new 2d3203b PARQUET-1005: Fix DumpCommand parsing to allow column
projection
new 352b906 PARQUET-1026: allow unsigned binary stats when min == max
new df9f8d8 PARQUET-1024: allow for case insensitive parquet-xxx prefix
in PR title
new ddbeb4d PARQUET-777: Add Parquet CLI.
new d55a572 PARQUET-1133 Add int96 support by returning bytearray, Skip
originalType comparison for map types when originalType is null
new 328c5de PARQUET-1115: Warn users when misusing parquet-tools merge
new 170cfa7 PARQUET-1152: Parquet-thrift doesn't compile with Thrift 0.9.3
new c532b0e PARQUET-1153: Parquet-thrift doesn't compile with Thrift
0.10.0
new ba7b8ba PARQUET-1149: Update Avro to 1.8.2
new 132b2a8 PARQUET-1143: Update to Parquet format 2.4.0.
new 81f4801 PARQUET-1156: Address dev/merge_parquet_pr.py problems.
new 8bfd9b4 PARQUET-1142: Add alternatives to Hadoop classes in the API
new da3e8eb PARQUET-357: Parquet-thrift generates wrong schema for Thrift
binary fields
new 9191fbd PARQUET-1141: Fix field ID handling
new 2adb657 PARQUET-1077: Use long key ids in KEYS file
new 3783ca4 PARQUET-1185: TestBinary#testBinary unit test fails after
PARQUET-1141
new 4d996d1 PARQUET-386: Printing out the statistics of metadata in
parquet-tools
new c6764c4 PARQUET-1025: Support new min-max statistics in parquet-mr
new b80b184 PARQUET-1197: Log rat failures
new 878ebcd PARQUET-1191: Type.hashCode() takes originalType into account
but Type.equals() does not
new 89aeec0 PARQUET-1170: Logical-type-based toString for proper
representeation in tools/logs
new 6e0cc72 PARQUET-1065: Deprecate type-defined sort ordering for INT96
type.
new 6a4bbe9 PARQUET-1198: Bump java source and target to java8
new 445cb9d PARQUET-1215: Add getFooter to ParquetWriter.
new ad80bfe PARQUET-1208: Occasional endless loop in unit test
new 8bbc6cb PARQUET-787: Limit read allocation size
new b82d962 PARQUET-1217: Incorrect handling of missing values in
Statistics
new 3d2d4fd PARQUET-1135: upgrade thrift and protobuf dependencies
new 0a86429 PARQUET-1246: Ignore float/double statistics in case of NaN
new a7ca605 PARQUET-1258: Update scm developer connection to github (#462)
new d54fad8 PARQUET-1183: Add Avro builders using InputFile and
OutputFile. (#460)
new 12bbaf3 PARQUET-1263: If file has a config, use it for
ParquetReadOptions. (#464)
new 9261c28 PARQUET-1189: Update CHANGES.md for 1.10.0 release.
new d61d221 PARQUET-1264: Fix javadoc warnings for Java 8.
new 150c578 PARQUET-1264: Fix javadoc 8 problem in VersionGenerator.
new 0d55abd RQUET-1264: Fix javadoc warnings for Java 8.
new ce4d1c9 PARQUET-1258: Update scm developer connection to github HTTPS.
new 031a665 [maven-release-plugin] prepare release apache-parquet-1.10.0
new 625aa51 PARQUET-1512: Set version to 1.10.1-SNAPSHOT.
new 4f945b9 PARQUET-1309: Parquet Java uses incorrect stats and
dictionary filter properties (#490)
new 50b1f47 PARQUET-1510: Fix notEq for optional columns with null
values. (#603)
new 68125a7 PARQUET-1512: Update CHANGES.md for 1.10.1.
new 8ad44a9 [maven-release-plugin] prepare release apache-parquet-1.10.1
The 1888 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails. The revisions
listed as "add" were already present in the repository and have only
been added to this reference.