Meeting Minutes from 2023-07-19 Iceberg Sync

Brian Olsen Thu, 26 Oct 2023 14:25:09 -0700

Hey Iceberg Nation,
Everyone is welcome to attend syncs. Subscribe to this calendar
<https://calendar.google.com/calendar/embed?src=3905d492f1b450ba0712f2ae6afa76eb757f13d85220cc03aa4527885adc5629%40group.calendar.google.com&ctz=Asia%2FShanghai>
to receive a notification. Note: This meeting note is backdated as I forgot
to post it here earlier. 2023-07-19 (Meeting Recording
<https://www.youtube.com/watch?v=BJwgLWrCIHI> ⭕ ) Highlights - PyIceberg
0.4.0 is out - Python Avro reads are 18% faster - Python concurrency
updated for AWS Lambda - Added Avro writes to Python - Fixed Spark
deleteWhere with WAP branch - Added registerTable to REST catalog - FLIP-27
Flink source switched to JSON parser for FileScanTask Releases - Please
vote on 1.3.1 - Java 1.4.0 - Targeting August for RC - Anton volunteered to
RM - Distributed planning - Row-level operation updates: MoR schema
pruning, etc. - Dynamic pruning stretch goal, mainly targeting MoR - Python
0.5.0 Discussion - View API issues (
https://github.com/apache/iceberg/pull/7992) - Should Projections take in
schema vs spec? Are there issues evaluating filters, with Time Travel
because we use the wrong schema? ) Came up while looking at this issue:
https://github.com/apache/iceberg/issues/7774 - Gradle version catalog
support - Applying spotless for scala code - Add Golang Iceberg to
Repo? AI-generated
chapter summaries: 0:00 <https://www.youtube.com/watch?v=BJwgLWrCIHI&t=0s>
Chapter 1 The team discussed updates and progress on both the Python and
Java sides, including new features, performance improvements, and upcoming
releases. They also talked about the UAPI and the need to deprecate and
move certain interfaces. 10:40
<https://www.youtube.com/watch?v=BJwgLWrCIHI&t=640s> Chapter 2 The team
discussed the issue of generated classes appearing in the API package and
decided to break those classes and improve the generation process in the
future. They also discussed the problem of projections binding expressions
to the schema and agreed that passing the schema to the projections would
be a better solution. 21:37
<https://www.youtube.com/watch?v=BJwgLWrCIHI&t=1297s> Chapter 3 Eduard
raised awareness about updating the dependency versioning plugin and
ensuring compatibility with Dependable. Anton expressed concerns about
applying spotless for Scala code due to differences with Spark, but agreed
to revisit the topic once Spark 3.5 is released. Matt proposed a Golang
implementation of iceberg and discussed the possibility of integrating it
into the main repository, with separate versioning and considerations for
release scripts and CI. 31:52
<https://www.youtube.com/watch?v=BJwgLWrCIHI&t=1912s> Chapter 4 Matt and
Steven discussed the process of moving the code into the foundation,
including licensing and practical issues. They decided to start small PRs
to get more eyes on the code and build understanding, with Jacob offering
to assist. 42:10 <https://www.youtube.com/watch?v=BJwgLWrCIHI&t=2530s>
Chapter 5 Matt and Rusty discussed the need for a common representation of
tasks in Arrow and the desire to create a substrate plan for iceberg scans
with pushdown and deletes. They aimed to simplify the integration of
different languages and make querying iceberg tables more efficient. 51:29
<https://www.youtube.com/watch?v=BJwgLWrCIHI&t=3089s> Chapter 6 Matt,
Fokko, and others discussed the benefits of representing plans as substrate
plans and the need for correct column projection in Arrow. They also
mentioned the possibility of opening an issue to coordinate on implementing
iceberg column resolution in C++.

Meeting Minutes from 2023-07-19 Iceberg Sync

Reply via email to