Notes:
Wes (Two Sigma):
Status:
- IO layer
- Benchmark
- Python -> arrow with zero-copy. Some overhead in Pandas. discussed for
Pandas 2.0
Priority:
- integration tests:
- generate some hand coded datasets to validate on each side
(Java/CPP)
- cross validation
- csv files?
Uwe (Blue Yonder):
- Arrow-Parquet integration:
- chasing memory issue reading from parquet to arrow
- good performance reading from parquet to arrow
- need to add github hook for external services.
- IPC
Amitabha (Apple):
- Experience
- Cassandra contributor
- Apple
- Research UC Berkeley in datascience
- looking forward to contribute.
Julien (Dremio):
- been working Parquet-arrow integration:
- schema conversion
- nested reader
- pending PRs on Parquet.
- java roundtrip tool:
- fixing a bug in reading
- talking on Friday in NY about Parquet and Arrow
Actions:
- Open jiras for integration testing subtasks
- create arrow-integration testing reference files.
- request github integration hook
On Thu, Nov 3, 2016 at 9:57 AM, Julien Le Dem <[email protected]> wrote:
> it is happening now.
>
>
> On Thu, Nov 3, 2016 at 7:44 AM, Julien Le Dem <[email protected]> wrote:
>
>> Every other week we do an Arrow sync over google hangout:
>> https://plus.google.com/hangouts/_/dremio.com/arrow
>> Thursday 10am PT
>> --
>> Julien
>>
>
>
>
> --
> Julien
>
--
Julien