Logical types are more for the object models than they are for users,
but the nice thing is that they are optional. So if an object model
can't support a type, the user can get the underlying data and still use it.
For example, Thrift doesn't have date/time types. So the object model
can only return the underlying data for a user to convert to a day.
Avro, on the other hand, is about to release support for date/time types
and the support in Parquet will implement the same conversions.
One thing we don't do very well is allow users to annotate types when
they're writing. We should be looking into that pretty soon.
rb
On 08/10/2015 07:32 AM, Thanh Do wrote:
Thanks Julien! Got it.
A follow up question. Are logical type annotations supposed to be hints? I
mean, if some users generate a Parquet file using Hive (via external table
mechanism), then consume it using Impala (again, through external table),
should there be some standardized annotations between the two systems
right? Or the users are responsible for creating the correct schema types
that map correctly with Parquet primitive types, regardless of the
annotations?
Thanh
On Thu, Aug 6, 2015 at 5:11 PM, Julien Le Dem <[email protected]>
wrote:
FileMetadata.schema is a list of SchemaElements
SchemaElement.converted_type contains the annotation
If you use parquet-mr to access the schema, look at the originalType field:
https://github.com/apache/parquet-mr/blob/2f956f46580e5b4752173e885d37a20fe31a78d8/parquet-column/src/main/java/org/apache/parquet/schema/Type.java#L113
On Thu, Aug 6, 2015 at 2:12 PM, Thanh Do <[email protected]> wrote:
Hi all,
From the documentation, I understand that Parquet supports a small number
of primitive types and it is up to the reader to interpret these
primitive
types to a potentially broader logical types.
Indeed, ConvertedType annotations can be use do specify such
interpretation. According to the documentation (
http://parquet.apache.org/documentation/latest/): "Annotations are
stored
as a ConvertedType in the file metadata"
But looking at the FileMetaData.java code (
http://grepcode.com/file/repo1.maven.org/maven2/com.twitter/parquet-format/2.2.0/parquet/format/FileMetaData.java#FileMetaData._Fields
)
I cannot not find an API to get the annotation information.
Am I missing something here? How do I set/get these annotations?
Regards,
Thanh
--
Ryan Blue
Software Engineer
Cloudera, Inc.