Hey, thank you fort he link... I did not know of this.. this is exactly what I was looking for!
Julian PS.: Looking forward to your PR : ) Am 05.03.19, 12:26 schrieb "Xiangdong Huang" <saint...@gmail.com>: Hi, 1. We have a document to introduce that: https://cwiki.apache.org/confluence/display/IOTDB/TsFile+Format 2. The new API for recovering data is almost done. I am writing the UTs now. Maybe I can submit a PR tonight (if everything is fine...) Best, ----------------------------------- Xiangdong Huang School of Software, Tsinghua University 黄向东 清华大学 软件学院 Julian Feinauer <j.feina...@pragmaticminds.de> 于2019年3月5日周二 下午6:00写道: > Hi Xiangdong, > > that sounds excellent. > Do you have a short overview of how the file format is designed on disk? > I know that its somewhat similar to parquet but I did not find more > details. > Basically what would suffice for us would be something like skipping an > invalid column group (or how you name it) and go on with the next, or so. > > Julian > > Am 04.03.19, 13:21 schrieb "Xiangdong Huang" <saint...@gmail.com>: > > Hi, > > If so, I think I need to add a new API to allow you continue to write > data > in an existing but not closed correctly TsFile. Then everything is > fine > for you :D > > Best, > ----------------------------------- > Xiangdong Huang > School of Software, Tsinghua University > > 黄向东 > 清华大学 软件学院 > > > Julian Feinauer <j.feina...@pragmaticminds.de> 于2019年3月4日周一 下午8:08写道: > > > Hey Xiangdong, > > > > thanks for the great explanation. > > And in fact, I agree with you that it would be best if we start to > play > > around with it and reply all our findings or wishes back to this > list (in > > fact that proved to be beneficial in plc4x as well). > > > > You confirm my thoughts about the two "levels" of APIs (DB and file) > and > > the file api is exactly what we looked for for our use case. > > As we do not care much about data loss (when an edge device fails > its... > > gone). > > The crucial point for us is that no corrupt files can be generated. > > This means I'm fine when the last data submitted is lost but I'm not > fine > > if we can get to a situation where the last datafile is completely > lost > > (well, perhaps this could be acceptable). > > > > @tim: Perhaps its best when you give some more information to > Xiangdong > > about our idea, and we can also point to our current code in github > > > > Julian > > > > Am 04.03.19, 13:03 schrieb "Xiangdong Huang" <saint...@gmail.com>: > > > > Hi, > > > > TsFile API is not deprecated. In fact, it is designed for this > > scenario and > > MapReduce/Spark computing. > > > > If you just use Reader and Writer API, there is something you > need to > > know: > > > > Let's suppose your block size is x Bytes, > (tsfile-format.properties: > > group_size_in_byte). > > > > 1. If you write data and a shutdown occurs, then all data that is > > flushed > > on disk is ok, and you can read the data ( class > > org.apache.iotdb.tsfile.TsFileSequenceRead is an example, but > you need > > to > > change it a little. I think I can write an example.) > > > > 2. Actually, TsFile has the ability to allow you continue to > write > > data at > > the end of the incomplete file. However, We do not provide this > API > > now... > > If needed, I can add the API. > > > > 3. In this scenario, you will lose at most x Bytes data. If you > do not > > accept that, something like WAL is needed. (It is not very > complex, > > but I > > am not sure that whether it should be an embedded function for > TsFile). > > > > Up to now, we can consider that TsFile API is suitable for your > > scenario > > (even though we need to add a little more API if you desire). > And you > > can > > get the ability to compress data, and query data from the TsFile > rather > > than scan the data from the head to the tail. > > > > However, TsFile has one constraint: You can not write > out-of-order data > > into a TsFile, otherwise the query API may return incomplete > result. > > But I think it is ok for real applications, because I do not > think > > that a > > device can generate out-of-order data.... > > > > For example, If you write two devices' data into one TsFile, it > is ok > > if > > you write data like: > > - d1.t1, d1.t2, d2.t1, d2.t2, d2.t3, d1.t4, d1.t5 .... > > or: > > - d1.m1.t1, d1.m1.t2, d1.m2.t1, d1.m2.t2, d2.m1.t1 ... > > > > But you can not write data like: > > - d1.m1.t2, d1.m1.t1 ... > > > > I think it is a good chance to improve TsFile to make it more > suitable > > for > > real applications, so please do not hesitate to tell me more > about > > what you > > think TsFile should want to have? > > > > Best, > > ----------------------------------- > > Xiangdong Huang > > School of Software, Tsinghua University > > > > 黄向东 > > 清华大学 软件学院 > > > > > > Julian Feinauer <j.feina...@pragmaticminds.de> 于2019年3月4日周一 > 下午7:17写道: > > > > > Hi Xiangdong, > > > > > > thanks for the info. > > > How is it in the case when you use the Reader / Writer API for > the > > tsfiles > > > directly (or should this be considered "deprecated")? > > > Can these files come to corrupted state? > > > > > > One Situation where we have to deal with these situations is > "at the > > edge" > > > when we have devices inside large machines. > > > Usually at the end of the shift these machines (and therefore > our > > device) > > > is powered off hard, so no shutdown or de-initialization is > possible. > > > > > > Best > > > Julian > > > > > > Am 04.03.19, 12:14 schrieb "Xiangdong Huang" < > saint...@gmail.com>: > > > > > > Hi, > > > > > > IoTDB can support either on a server with 7*24 or a > RaspberryPi. > > We > > > have > > > tested both the two scenario. > > > > > > When you shutdown an IoTDB instance in force (e.g., power > off) > > and > > > restart > > > it again, no data loses ( if you enable the WAL). > > > > > > However, currently we do not optimize the time cost of the > > restart > > > process. > > > It is an important feature that we need to do, because we > hope > > IoTDB > > > can > > > support data management either on the edge devices or the > data > > center. > > > > > > And, the default configuration is not so suitable for > running on > > the > > > edge > > > device. (e.g., block size is 128MB, which is too large for > a > > > RaspberryPi, > > > and will slow down the restart process because there are > too > > much WAL > > > data > > > on disk). > > > > > > Best, > > > ----------------------------------- > > > Xiangdong Huang > > > School of Software, Tsinghua University > > > > > > 黄向东 > > > 清华大学 软件学院 > > > > > > > > > Tim Mitsch <t.mit...@pragmaticindustries.de> 于2019年3月4日周一 > > 下午6:53写道: > > > > > > > Hello development-team > > > > > > > > First of all thanks for developing this kind of > interesting > > project > > > and > > > > bringing it into apache incubator. > > > > > > > > I have a question regarding the place of operation and > > robustness: > > > > > > > > * Is iotDB concepted as application on a server > which is > > running > > > 24/7 > > > > or > > > > * Is it also possible to run it on a device like > > RaspberryPi or > > > IPC, > > > > where operation can interrupt. > > > > I’m asking because i’m searching for solution for a > temporary > > > storage that > > > > is robust against spontaneous interrupt, e.g. switch off > > electricity > > > > without regular shutdown of OS – have u tested something > like > > this > > > yet? > > > > > > > > Best regards > > > > Tim > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >