Hi Elliot,

Given your description, I agree extension types sound like they may be a
good idea, similar to geoarrow[1] for Geospatial data where there is extra
metadata[2] needed to interpret underlying types (e.g. factor and offset)

Andrew

[1] https://github.com/geoarrow/geoarrow
[2] https://arrow.apache.org/docs/format/CanonicalExtensions.html#geoarrow

On Sat, Jan 6, 2024 at 3:20 AM Morrison-Reed Elliot (BEG/PJ-EDS-NA)
<elliot.morrison-r...@us.bosch.com.invalid> wrote:

> Background
>
> I have been looking into using parquet files for storing and working with
> automotive data. One interesting thing about automotive data is that most
> communication happens on the CAN bus where we have extremely limited
> bandwidth.
> In order to encode "physical" values in a very space efficient way, we
> use linear conversion formulas that look like "phys = (raw * factor) +
> offset".
> This gives implicit range and resolution limits, but that is often just
> fine
> when we are representing a physical property.
>
> Example 1:
>
> We have a throttle that can be anywhere from 0-100% and we want to fit that
> value into 1 byte. So we would use a formula like:
>
>     phys = (raw * 0.39215) + 0
>
> Example 2:
>
> We want to record ambient temperature of the vehicle. Resolution of 1
> degree is
> fine. Also, temperatures below -40 and above 215 degrees C are not
> particularly
> useful as they are very rare and out of scope for a useful temperature.
>
>     phys = (raw * 1.0) - 40
>
> So far, I have been converting the raw data into floating point data before
> writing to arrow format to make it easier for the analysts to use the
> data. This of course means that I am converting to a less efficient format
> and I
> am also losing inherent information about the raw signal. I would rather
> be able
> to store the raw data in an appropriately sized unsigned integer and
> automatically convert to floating point when using the data, similar to
> dictionary encoding.
>
> Discussion
>
> - How would people generally deal with this situation using the arrow
> format?
> - Is this something that other people are interested in?
> - If this were to be added to the spec, what would be the best way to do
> it?
>
> While I am coming from an automotive perspective, I think there are many
> other
> areas of applicability (reading sensor data through an ADC, industrial
> automation and monitoring, etc.)
>
> I could see this working as either a new primitive type (similar to
> decimal), or
> as an extension where we simply put the factor and offset as standard
> metadata
> fields.
>
> Best regards,
> Elliot Morrison-Reed
>
>

Reply via email to