[ 
https://issues.apache.org/jira/browse/IOTDB-78?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiangdong Huang closed IOTDB-78.
--------------------------------

This issue is solved when implementing the new storage engine

> make Overflow file format more similar with TsFile
> --------------------------------------------------
>
>                 Key: IOTDB-78
>                 URL: https://issues.apache.org/jira/browse/IOTDB-78
>             Project: Apache IoTDB
>          Issue Type: Sub-task
>            Reporter: xiangdong Huang
>            Assignee: xiangdong Huang
>            Priority: Major
>         Attachments: DisorderedTsFileTest.java
>
>
> If we make the format of Overflow similar with TsFile, we can make our codes 
> more concise and maintainable.
>  
> Actually, current overflow also consists of chunk groups. The only difference 
> is that the chunk group's metadata (or called, the index) is not at the end 
> of the file, but the end of the chunk group. If we change it, then the file 
> format is totally the same with TsFile.
> However, given a device, its chunk groups are not time-ordered in an overflow 
> file, but SHOULD be time-ordered in a TsFile. That is the only difference.
>  
> But, after a test, I find that TsFile supports writing chunk groups out of 
> the time-order (Actually we make this guarantee in the IoTDB module, rather 
> than the TsFile module). 
> See the codes:
> {code:java}
> // code placeholder
> @Test
> public void writeReadDisorderDataTest() throws IOException, 
> WriteProcessException {
>   try (TsFileWriter tsFileWriter = new TsFileWriter(f)) {
>     // add measurements into file schema
>     tsFileWriter
>         .addMeasurement(new MeasurementSchema("sensor_1", TSDataType.FLOAT, 
> TSEncoding.RLE));
>     // write: t3~t6, flush, t3~t6 again, flush, then t1~t2
>     for (int j =0; j < 2; j++) {
>       for (long i = 3; i < 7; i++) {
>         TSRecord tsRecord = new TSRecord(i, "device_1");
>         DataPoint dPoint1 = new FloatDataPoint("sensor_1", 1.2f);
>         tsRecord.addTuple(dPoint1);
>         tsFileWriter.write(tsRecord);
>       }
>       tsFileWriter.flushForTest();
>     }
>     for (long i = 1; i < 3; i++) {
>       TSRecord tsRecord = new TSRecord(i, "device_1");
>       DataPoint dPoint1 = new FloatDataPoint("sensor_1", 1.2f);
>       tsRecord.addTuple(dPoint1);
>       tsFileWriter.write(tsRecord);
>     }
>   }
>   // read data using TsFileSequenceReader
>   try (TsFileSequenceReader reader = new TsFileSequenceReader(path)) {
>     reader.readHeadMagic();
>     List<Chunk> chunks;
>     while( (chunks = readNextChunk(reader)).size() > 0) {
>       for (Chunk chunk : chunks) {
>         ChunkHeader header = chunk.getHeader();
>         System.out.println(header.toString());
>         ChunkReader reader1 = new ChunkReaderWithoutFilter(chunk);
>         while (reader1.hasNextBatch()) {
>           BatchData data = reader1.nextBatch();
>           while (data.hasNext()) {
>             System.out.println(data.currentTime() + ", " + 
> data.currentValue());
>             data.next();
>           }
>         }
>       }
>       ChunkGroupFooter footer = reader.readChunkGroupFooter();
>       System.out.println(footer.toString());
>     }
>   }
>   // read data using TsFileSequenceReader
>   try (TsFileSequenceReader reader = new TsFileSequenceReader(path)) {
>     MetadataQuerierByFileImpl metadataQuerier = new 
> MetadataQuerierByFileImpl(reader);
>     List<ChunkMetaData> metaDataList = 
> metadataQuerier.getChunkMetaDataList(new Path("device_1", "sensor_1"));
>     ChunkLoader chunkLoader = new ChunkLoaderImpl(reader);
>     FileSeriesReader reader1 = new FileSeriesReaderWithoutFilter(chunkLoader, 
> metaDataList);
>     while (reader1.hasNextBatch()) {
>       BatchData data = reader1.nextBatch();
>       while (data.hasNext()) {
>         System.out.println(data.currentTime() + ", " + data.currentValue());
>         data.next();
>       }
>     }
>     reader1.close();
>   }
>   assertTrue(f.delete());
> }
> List<Chunk> readNextChunk(TsFileSequenceReader reader) throws IOException {
>   List<Chunk> result = new ArrayList<>();
>   while (reader.readMarker() == MetaMarker.CHUNK_HEADER) {
>     ChunkHeader header = reader.readChunkHeader();
>     ByteBuffer data = reader.readChunk(header);
>     result.add(new Chunk(header, data));
>   }
>   return result;
> }
> {code}
>  The output is:
> {panel:title=Output}
> CHUNK_HEADER\{measurementID='sensor_1', dataSize=89, dataType=FLOAT, 
> compressionType=UNCOMPRESSED, encodingType=RLE, numOfPages=1, 
> serializedSize=35}
> 3, 1.2
> 4, 1.2
> 5, 1.2
> 6, 1.2
> CHUNK_GROUP_FOOTER\{deviceID='device_1', dataSize=124, numberOfChunks=1, 
> serializedSize=25}
> CHUNK_HEADER\{measurementID='sensor_1', dataSize=89, dataType=FLOAT, 
> compressionType=UNCOMPRESSED, encodingType=RLE, numOfPages=1, 
> serializedSize=35}
> 3, 1.2
> 4, 1.2
> 5, 1.2
> 6, 1.2
> CHUNK_GROUP_FOOTER\{deviceID='device_1', dataSize=124, numberOfChunks=1, 
> serializedSize=25}
> CHUNK_HEADER\{measurementID='sensor_1', dataSize=89, dataType=FLOAT, 
> compressionType=UNCOMPRESSED, encodingType=RLE, numOfPages=1, 
> serializedSize=35}
> 1, 1.2
> 2, 1.2
> CHUNK_GROUP_FOOTER\{deviceID='device_1', dataSize=124, numberOfChunks=1, 
> serializedSize=25}
> 3, 1.2
> 4, 1.2
> 5, 1.2
> 6, 1.2
> 3, 1.2
> 4, 1.2
> 5, 1.2
> 6, 1.2
> 1, 1.2
> 2, 1.2
> {panel}
> Here we write time 3~6 twice, and then 1~2. 
> We can find that Both ChunkReader and FileSeriesReader (WithoutFilter) can 
> read the data correctly. (Because a correct TsFile has no such data, so it is 
> ok).
> So, maybe we just need to change a little on current *Reader classes, and 
> then we can support Overflow files.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to