[
https://issues.apache.org/jira/browse/IOTDB-78?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
xiangdong Huang closed IOTDB-78.
--------------------------------
This issue is solved when implementing the new storage engine
> make Overflow file format more similar with TsFile
> --------------------------------------------------
>
> Key: IOTDB-78
> URL: https://issues.apache.org/jira/browse/IOTDB-78
> Project: Apache IoTDB
> Issue Type: Sub-task
> Reporter: xiangdong Huang
> Assignee: xiangdong Huang
> Priority: Major
> Attachments: DisorderedTsFileTest.java
>
>
> If we make the format of Overflow similar with TsFile, we can make our codes
> more concise and maintainable.
>
> Actually, current overflow also consists of chunk groups. The only difference
> is that the chunk group's metadata (or called, the index) is not at the end
> of the file, but the end of the chunk group. If we change it, then the file
> format is totally the same with TsFile.
> However, given a device, its chunk groups are not time-ordered in an overflow
> file, but SHOULD be time-ordered in a TsFile. That is the only difference.
>
> But, after a test, I find that TsFile supports writing chunk groups out of
> the time-order (Actually we make this guarantee in the IoTDB module, rather
> than the TsFile module).
> See the codes:
> {code:java}
> // code placeholder
> @Test
> public void writeReadDisorderDataTest() throws IOException,
> WriteProcessException {
> try (TsFileWriter tsFileWriter = new TsFileWriter(f)) {
> // add measurements into file schema
> tsFileWriter
> .addMeasurement(new MeasurementSchema("sensor_1", TSDataType.FLOAT,
> TSEncoding.RLE));
> // write: t3~t6, flush, t3~t6 again, flush, then t1~t2
> for (int j =0; j < 2; j++) {
> for (long i = 3; i < 7; i++) {
> TSRecord tsRecord = new TSRecord(i, "device_1");
> DataPoint dPoint1 = new FloatDataPoint("sensor_1", 1.2f);
> tsRecord.addTuple(dPoint1);
> tsFileWriter.write(tsRecord);
> }
> tsFileWriter.flushForTest();
> }
> for (long i = 1; i < 3; i++) {
> TSRecord tsRecord = new TSRecord(i, "device_1");
> DataPoint dPoint1 = new FloatDataPoint("sensor_1", 1.2f);
> tsRecord.addTuple(dPoint1);
> tsFileWriter.write(tsRecord);
> }
> }
> // read data using TsFileSequenceReader
> try (TsFileSequenceReader reader = new TsFileSequenceReader(path)) {
> reader.readHeadMagic();
> List<Chunk> chunks;
> while( (chunks = readNextChunk(reader)).size() > 0) {
> for (Chunk chunk : chunks) {
> ChunkHeader header = chunk.getHeader();
> System.out.println(header.toString());
> ChunkReader reader1 = new ChunkReaderWithoutFilter(chunk);
> while (reader1.hasNextBatch()) {
> BatchData data = reader1.nextBatch();
> while (data.hasNext()) {
> System.out.println(data.currentTime() + ", " +
> data.currentValue());
> data.next();
> }
> }
> }
> ChunkGroupFooter footer = reader.readChunkGroupFooter();
> System.out.println(footer.toString());
> }
> }
> // read data using TsFileSequenceReader
> try (TsFileSequenceReader reader = new TsFileSequenceReader(path)) {
> MetadataQuerierByFileImpl metadataQuerier = new
> MetadataQuerierByFileImpl(reader);
> List<ChunkMetaData> metaDataList =
> metadataQuerier.getChunkMetaDataList(new Path("device_1", "sensor_1"));
> ChunkLoader chunkLoader = new ChunkLoaderImpl(reader);
> FileSeriesReader reader1 = new FileSeriesReaderWithoutFilter(chunkLoader,
> metaDataList);
> while (reader1.hasNextBatch()) {
> BatchData data = reader1.nextBatch();
> while (data.hasNext()) {
> System.out.println(data.currentTime() + ", " + data.currentValue());
> data.next();
> }
> }
> reader1.close();
> }
> assertTrue(f.delete());
> }
> List<Chunk> readNextChunk(TsFileSequenceReader reader) throws IOException {
> List<Chunk> result = new ArrayList<>();
> while (reader.readMarker() == MetaMarker.CHUNK_HEADER) {
> ChunkHeader header = reader.readChunkHeader();
> ByteBuffer data = reader.readChunk(header);
> result.add(new Chunk(header, data));
> }
> return result;
> }
> {code}
> The output is:
> {panel:title=Output}
> CHUNK_HEADER\{measurementID='sensor_1', dataSize=89, dataType=FLOAT,
> compressionType=UNCOMPRESSED, encodingType=RLE, numOfPages=1,
> serializedSize=35}
> 3, 1.2
> 4, 1.2
> 5, 1.2
> 6, 1.2
> CHUNK_GROUP_FOOTER\{deviceID='device_1', dataSize=124, numberOfChunks=1,
> serializedSize=25}
> CHUNK_HEADER\{measurementID='sensor_1', dataSize=89, dataType=FLOAT,
> compressionType=UNCOMPRESSED, encodingType=RLE, numOfPages=1,
> serializedSize=35}
> 3, 1.2
> 4, 1.2
> 5, 1.2
> 6, 1.2
> CHUNK_GROUP_FOOTER\{deviceID='device_1', dataSize=124, numberOfChunks=1,
> serializedSize=25}
> CHUNK_HEADER\{measurementID='sensor_1', dataSize=89, dataType=FLOAT,
> compressionType=UNCOMPRESSED, encodingType=RLE, numOfPages=1,
> serializedSize=35}
> 1, 1.2
> 2, 1.2
> CHUNK_GROUP_FOOTER\{deviceID='device_1', dataSize=124, numberOfChunks=1,
> serializedSize=25}
> 3, 1.2
> 4, 1.2
> 5, 1.2
> 6, 1.2
> 3, 1.2
> 4, 1.2
> 5, 1.2
> 6, 1.2
> 1, 1.2
> 2, 1.2
> {panel}
> Here we write time 3~6 twice, and then 1~2.
> We can find that Both ChunkReader and FileSeriesReader (WithoutFilter) can
> read the data correctly. (Because a correct TsFile has no such data, so it is
> ok).
> So, maybe we just need to change a little on current *Reader classes, and
> then we can support Overflow files.
>
>
>
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)