Re: InfluxDB video: "Parquet is a standard like SQL is a standard"

Andrew Lamb Tue, 19 Nov 2024 13:06:23 -0800

👋 I happened to work with Paul building InfluxDB 3.0 so I have some
perspective on this matter

One example of "Parquet being a standard like SQL is standard" we hit is
nanosecond timestamps. While they were added to the spec in 2018[1] they
still aren't supported by major systems (ahem Spark, Iceberg) which means
we can't interoperate with those systems unless we rewrite data to use
millisecond timestamps.

We hit the same issue with  `BYTE_STREAM_SPLIT`  encoding[2] which wasn't
supported by pyspark as I remember (even though it was quite effective for
our timestamp data).

This is entirely an ecosystem / people / momentum problem in my opinion
(not a technical one)

I have some thoughts [3] on how to help (compatibility matrix, define what
"compatible" means, etc) but I haven't been able to get people excited
about it 🤷)

Hope that offers some context,
Andrew

[1] https://github.com/apache/parquet-format/pull/102
[2]: https://github.com/apache/parquet-format/pull/144
[3]: https://github.com/apache/parquet-format/issues/441

On Tue, Nov 19, 2024 at 1:50 PM Steve Loughran <ste...@cloudera.com.invalid>
wrote:

> I watched a really good video recently where Paul Dix described the
> iterations of influx DB with really open reviews about the issues they've
> encountered
>
> One comment "Parquet is a standard like SQL is a standard" called out
> inconsistent handling of Parquet data between their service and others.
>
> https://youtu.be/AGS4GNGDK_4?si=27CvjJ5O69cI8xZZ&t=3867
>
> Does anyone know what problems were encountered? Are there open issues? And
> sample files?
>
> I'm curious about this stuff -and debugging some of this seems a really
> good way to learn about the lower level details of the formats and how
> libraries work with that.
>
> Nobody wants to be accused as a "standard like SQL" in a way that is not
> praise.
>

Re: InfluxDB video: "Parquet is a standard like SQL is a standard"

Reply via email to