Hi, in fact it is mostly in the MDF spec not for compression (that’s a nice side effect) but rather for being able to really express the (physical) content of a signal. So my initial idea was to implement it as an optional layer on top of the current tsfile which does the "interpretation". Because in the tsfile its always just a "primitive" series that is stored.
So the idea would be to store some metadata (like a formula, lookup table, ...) on creation and use that on reading but only optionally. You can look at how avro handles non primitive types (they call it LogicalTypes) here: https://avro.apache.org/docs/1.8.1/spec.html#Logical+Types This is similar to my idea. Julian Am 29.10.19, 14:40 schrieb "Xiangdong Huang" <[email protected]>: Hi, > Then its most efficient to store integers and a formula like a * x + b with e.g. b = 3 and a = 1/100. > So 3V would be stored as x = 0, 3.01V -> x = 1, ... 4.2V as x = 1200. > So we only store 0 to 1200 and no decimals and stuff which would be very easily compressable I thnk. Good idea! Two thumbs up for that. But for cases like the above, implementing a new encoding method is better than a new data type. e.g, create time series root.a.b.voltage with encoding = linear_transformation and encoding_parameter = "describe the function like y=a * x + b" and datatype = INT. "linear_transformation" is the new encoding method. Now I get two cases from the discussion, one is like Optional data, and the other is data that can be transformative. So, do we want to support the above two, or find a more general data type for "rich data type" (can the MDF file support some inspiration)? Best, ----------------------------------- Xiangdong Huang School of Software, Tsinghua University 黄向东 清华大学 软件学院 Julian Feinauer <[email protected]> 于2019年10月29日周二 下午8:26写道: > Hi Xiangdong, > > to your second question: > The use case ist he other way round. > We know that we measure e.g. a voltage between 3V and 4.2V with a > precision of 0.01 or something. > Then its most efficient to store integers and a formula like a * x + b > with e.g. b = 3 and a = 1/100. > So 3V would be stored as x = 0, 3.01V -> x = 1, ... 4.2V as x = 1200. > So we only store 0 to 1200 and no decimals and stuff which would be very > easily compressable I thnk. > > Julian > > Am 29.10.19, 07:13 schrieb "Xiangdong Huang" <[email protected]>: > > Hi, > > > In Java we could model it as a variable Optional<> x which could be > null, > Optional.empty(), Optional.of(true), Optional.of(false). > > It make sense. And, using a new data type to achieve in IoTDB it is > ok. > > > Or scale formulas like a*x+b which allows to leverage the precision > even > for “small” double values or even integers. > > So, are you considering a use case like: the time series value should > be > [1, 1, 0, 0, 1, 1, 1, 0, 0...] but actually we get [0.99, 0.99, 0.01, > 0, > 1, 1, 0.999, 0, 0.01] (because of the precision of sensors)? > And, what values do you want to save? > (1)save them as 1 and 0. Or, > (2) save them as 0.99, 0.01 indeed, but using a specific query API to > return data like 1 and 0? > > My another question is, is there a general data type can support the > above > cases? > > Best, > ----------------------------------- > Xiangdong Huang > School of Software, Tsinghua University > > 黄向东 > 清华大学 软件学院 > > > Julian Feinauer <[email protected]> 于2019年10月29日周二 > 上午3:58写道: > > > Hi all, > > > > I wanted to discuss a possible new feature I will call Rich Datatypes > > (RDT) API in the following. > > I worked a lot in the automotive industry and there is a broadly > adopted > > open Standard called ASAM MDF ( > https://www.asam.net/standards/detail/mdf/ > > ). > > It is a format which is targeted at the efficient storage but at the > same > > time it supports VERY complex types (which are often used in > automotive > > controllers). > > > > Take something as simple as a boolean. We could store it as a > boolean (as > > java bool) in 1 bit. > > BUT we have overall 4 possibilities: > > > > * No value is available for a timestamp (NULL / nothing stored) > > * We had a successful request but the Controller does not know > whether > > true or false (or had an internal error), this is a bit like > > Optional.isPresent() == false > > * True > > * False > > In Java we could model it as a variable Optional<> x which could be > null, > > Optional.empty(), Optional.of(true), Optional.of(false). > > > > Other examples are discrete values like “ON”, “OFF” (which are > handled as > > “lookup tables” on integer rows, internally). > > Or scale formulas like a*x+b which allows to leverage the precision > even > > for “small” double values or even integers. > > A formula but also a “fallback” lookup value like “NV”. > > > > I think this could be a valuable extension to IoTDB as an additional > API > > (not change anything below but just provide an API on top to do the > > calculation). > > > > What do others think? > > > > Julian > > > > >
