Notes:
Attendees/agenda building
Wes (TwoSigma):
- Rest API
- Roadmap
- communicate with community
Uwe (Blue Yonder):
- git tag for versioning
Julien (Dremio):
- Timestamp:
- REST API
- Roadmap
Discussion:
- git tag for versioning
- development packages version names are based on latest tag in history
from master + commit count since then.
- since the release tag is in a branch it goes from an older version
and is misleading
- options:
- add a tag {release version}.post on the first commit after the
release to get a better dev version string
- rebase master on top of the last release (0.4)
- we decided to rebase master (the only change is adding the commit
that updates the version number in pom files)
- Timestamp in Arrow and Parquet:
- Both support "Timezone Naive” timestamps (aka “timestamp without
timezone” in SQL)
- in Arrow when timezone field is missing in Timestamp type:
https://github.com/apache/arrow/blob/5899800f53f3c3fffc0db95294c4f0eb0e556228/format/Schema.fbs#L117
- in Parquet (proposed PR) when isAdjustedToUTC is false:
https://github.com/apache/parquet-format/pull/51/files#diff-0f9d1b5347959e15259da7ba8f4b6252R242
- They also both support a “Timezone aware” timestamp (aka “timestamp
with timezone” in SQL)
- in Arrow when the timezone field is present with the original
timezone.
- in Parquet when isAdjustedToUTC is true
- So there is more information in Arrow and it requires this extra
information since its absence means “timezone naive”
- conclusion:
- when writing to parquet we should use isAdjustedToUTC = false
only if there is no knowledge of the timezone
- when reading from parquet we will populate timezone with UTC
when isAdjustedToUTC == true (and leave it missing otherwise)
- REST API:
- review doc here:
https://docs.google.com/document/d/1N4TP6zARRs2c4_h-4WqCqIFVPQwmxOmXel1V3AxpGok/edit#
- Roadmap:
- todo: blog post to describe the direction of arrow
- among those:
- REST API and generalizing messaging
- C++ analytics library for interacting with ARROW memory. Tools for
wrapping existing data structure (array of doubles)
- arrow for GPU
- Arrow ODBC interface: turbodbc
- Spark integration improvements: group UDFS etc
On Wed, May 31, 2017 at 9:16 AM, Julien Le Dem <[email protected]> wrote:
> The arrow sync is at 9:30 am PT today on google hangout
> https://hangouts.google.com/hangouts/_/dremio.com/arrow
>
> --
> Julien
>
--
Julien