Re: Arrow sync in 15 min

Julien Le Dem Wed, 31 May 2017 11:07:00 -0700

 Notes:

Attendees/agenda building
Wes (TwoSigma):
 - Rest API
 - Roadmap
 - communicate with community
Uwe (Blue Yonder):
 - git tag for versioning
Julien (Dremio):
 - Timestamp:
 - REST API
 - Roadmap


Discussion:
 - git tag for versioning
    - development packages version names are based on latest tag in history
from master + commit count since then.
    - since the release tag is in a branch it goes from an older version
and is misleading
    - options:
       - add a tag {release version}.post on the first commit after the
release to get a better dev version string
       - rebase master on top of the last release (0.4)
    - we decided to rebase master (the only change is adding the commit
that updates the version number in pom files)
 - Timestamp in Arrow and Parquet:
    - Both support "Timezone Naive” timestamps (aka “timestamp without
timezone” in SQL)
        - in Arrow when timezone field is missing in Timestamp type:
https://github.com/apache/arrow/blob/5899800f53f3c3fffc0db95294c4f0eb0e556228/format/Schema.fbs#L117
        - in Parquet (proposed PR) when isAdjustedToUTC is false:
https://github.com/apache/parquet-format/pull/51/files#diff-0f9d1b5347959e15259da7ba8f4b6252R242
    - They also both support a “Timezone aware” timestamp (aka “timestamp
with timezone” in SQL)
        - in Arrow when the timezone field is present with the original
timezone.
        - in Parquet when isAdjustedToUTC is true
    - So there is more information in Arrow and it requires this extra
information since its absence means “timezone naive”
    - conclusion:
        - when writing to parquet we should use isAdjustedToUTC = false
only if there is no knowledge of the timezone
        - when reading from parquet we will populate timezone with UTC
when isAdjustedToUTC == true (and leave it missing otherwise)
 - REST API:
   - review doc here:
https://docs.google.com/document/d/1N4TP6zARRs2c4_h-4WqCqIFVPQwmxOmXel1V3AxpGok/edit#
 - Roadmap:
    - todo: blog post to describe the direction of arrow
    - among those:
      - REST API and generalizing messaging
       - C++ analytics library for interacting with ARROW memory. Tools for
wrapping existing data structure (array of doubles)
       - arrow for GPU
       - Arrow ODBC interface: turbodbc
       - Spark integration improvements: group UDFS etc

On Wed, May 31, 2017 at 9:16 AM, Julien Le Dem <[email protected]> wrote:

> The arrow sync is at 9:30 am PT today on google hangout
> https://hangouts.google.com/hangouts/_/dremio.com/arrow
>
> --
> Julien
>



-- 
Julien

Re: Arrow sync in 15 min

Reply via email to