Copilot commented on code in PR #18852:
URL: https://github.com/apache/pinot/pull/18852#discussion_r3486025461
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/store/SingleFileIndexDirectory.java:
##########
@@ -525,12 +531,14 @@ public Set<String> getColumnsWithIndex(IndexType<?, ?, ?>
type) {
}
}
if (type == StandardIndexes.vector()) {
+ // Vector may live as a combined file (legacy /
storeInSegmentFile=false) or as a typed
+ // entry in columns.psf (storeInSegmentFile=true). Collect both. Removed
the early-return
+ // that previously hid consolidated entries from this view.
for (String column : _segmentMetadata.getAllColumns()) {
if (VectorIndexUtils.hasVectorIndex(_segmentDirectory, column)) {
columns.add(column);
}
}
Review Comment:
`getColumnsWithIndex(StandardIndexes.vector())` only adds columns when
`hasVectorIndex(...)` is true, so columns that only have the transient
`*.combined.index` form will be omitted. That can break handler/loader logic
that relies on this set during migrations or crash recovery.
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/store/SingleFileIndexDirectory.java:
##########
@@ -160,8 +160,11 @@ public boolean hasIndexFor(String column, IndexType<?, ?,
?> type) {
if (type == StandardIndexes.text() &&
TextIndexUtils.hasTextIndex(_segmentDirectory, column)) {
return true;
}
- if (type == StandardIndexes.vector()) {
- return VectorIndexUtils.hasVectorIndex(_segmentDirectory, column);
+ // Vector index may live either as a combined file (legacy /
storeInSegmentFile=false) or as
+ // a typed entry inside columns.psf (storeInSegmentFile=true). Check both
— mirror the text
+ // pattern of "combined OR _columnEntries".
+ if (type == StandardIndexes.vector() &&
VectorIndexUtils.hasVectorIndex(_segmentDirectory, column)) {
+ return true;
}
Review Comment:
`hasIndexFor(..., StandardIndexes.vector())` only checks
`VectorIndexUtils.hasVectorIndex`, which deliberately excludes the new
`*.combined.index` transient form. If a segment directory temporarily contains
only the combined-form vector index file (e.g. crash/rollback before absorb),
`hasIndexFor` will return false and the vector reader will never be constructed
even though `SegmentDirectoryPaths.findVectorIndexIndexFile(...)` can now
resolve the combined file.
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/store/VectorIndexUtils.java:
##########
@@ -76,6 +95,39 @@ static boolean hasVectorIndex(File segDir, String column) {
|| new File(segDir, column +
Indexes.VECTOR_IVF_PQ_INDEX_FILE_EXTENSION).exists();
}
+ /// Returns {@code true} when the V1/V2 segment directory holds an IVF
vector index in the
+ /// combined-form extension ({@code .vector.ivfflat.combined.index} or
+ /// {@code .vector.ivfpq.combined.index}). The combined form is written by
an IVF creator run
+ /// with {@code storeInSegmentFile=true} and is meant to be packed into
{@code columns.psf} by
+ /// the V2→V3 converter, not preserved as a sibling.
+ public static boolean hasCombinedFormVectorIndex(File segDir, String column)
{
+ return new File(segDir, column +
Indexes.VECTOR_IVF_FLAT_COMBINED_INDEX_FILE_EXTENSION).exists()
+ || new File(segDir, column +
Indexes.VECTOR_IVF_PQ_COMBINED_INDEX_FILE_EXTENSION).exists()
+ || new File(segDir, column +
Indexes.VECTOR_HNSW_COMBINED_INDEX_FILE_EXTENSION).exists();
+ }
+
+ /// Returns the {@code columns.psf} typed-entry buffer holding the column's
consolidated vector
+ /// index, or {@code null} when no such entry has been packed into {@code
columns.psf} yet.
+ ///
+ /// Unlike {@link SegmentDirectory.Reader#hasIndexFor}, this does NOT report
a legacy on-disk
+ /// sidecar (an IVF flat file or an HNSW Lucene directory) as a match — only
a real packed
+ /// `_columnEntries` slot counts. {@code SingleFileIndexDirectory} signals
an absent typed slot by
+ /// throwing an unchecked exception from {@code getIndexFor}; that is mapped
to {@code null} here,
+ /// while genuine I/O failures propagate. Callers use this to tell "a prior
absorb already
+ /// committed bytes" (crash recovery) apart from "first absorb of an
existing sidecar", and to
+ /// select the {@code columns.psf} read path only when the consolidated
entry truly exists.
+ ///
+ /// The returned buffer is owned by the segment directory and must NOT be
closed by the caller.
+ @Nullable
+ public static PinotDataBuffer
getConsolidatedVectorEntry(SegmentDirectory.Reader reader, String column)
+ throws IOException {
+ try {
+ return reader.getIndexFor(column, StandardIndexes.vector());
+ } catch (RuntimeException e) {
+ return null;
+ }
Review Comment:
`getConsolidatedVectorEntry` catches *all* `RuntimeException` from
`reader.getIndexFor(...)` and converts it to `null`. In
`SingleFileIndexDirectory`, `getIndexFor` throws `RuntimeException` not only
for "missing typed entry" but also for corruption scenarios (e.g. missing magic
marker). Swallowing those will make real corruption look like "absent entry"
and can cause migration/crash-recovery to proceed on a broken segment without
surfacing the underlying problem.
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/vector/VectorIndexType.java:
##########
@@ -210,25 +211,81 @@ public VectorIndexReader
createIndexReader(SegmentDirectory.Reader segmentReader
return null;
}
VectorBackendType backendType = indexConfig.resolveBackendType();
- File configuredIndexFile =
- SegmentDirectoryPaths.findVectorIndexIndexFile(segmentDir,
metadata.getColumnName(), indexConfig);
- if (configuredIndexFile == null || !configuredIndexFile.exists()) {
- LOGGER.warn("Skipping vector index reader for column: {} because
configured backend {} does not have a "
- + "matching on-disk artifact in segment: {}",
- metadata.getColumnName(), backendType, segmentDir);
- return null;
+ String column = metadata.getColumnName();
+
+ if (backendType == VectorBackendType.HNSW) {
+ // Combined form: load the HNSW index from the typed entry inside
columns.psf when one
+ // actually exists. getConsolidatedVectorEntry (unlike hasIndexFor)
does not mistake the
+ // legacy Lucene directory for a packed entry, so a segment whose
handler has not yet
+ // migrated falls through to the legacy path below instead of failing
the whole load. The
+ // buffer is owned by the segment directory — this reader must not
close it.
+ if (indexConfig.isStoreInSegmentFile()) {
+ PinotDataBuffer buffer;
+ try {
+ buffer =
VectorIndexUtils.getConsolidatedVectorEntry(segmentReader, column);
+ } catch (IOException e) {
+ throw new RuntimeException(
+ "Failed to read consolidated HNSW vector index from
columns.psf for column: " + column, e);
+ }
+ if (buffer != null) {
+ return new HnswVectorIndexReader(column, buffer,
metadata.getTotalDocs(), indexConfig);
+ }
+ LOGGER.warn("storeInSegmentFile=true but no consolidated HNSW entry
found in columns.psf for column: {} "
+ + "in segment: {}; falling back to the on-disk Lucene
directory", column, segmentDir);
+ }
+ // Legacy path: load the HNSW index from the Lucene directory on disk.
+ return new HnswVectorIndexReader(column, segmentDir,
metadata.getTotalDocs(), indexConfig);
+ }
+
+ // IVF backends accept a PinotDataBuffer; that buffer either comes from
the consolidated
+ // typed entry inside columns.psf (when storeInSegmentFile=true) or from
the legacy combined
+ // file. The chosen reader takes ownership of the buffer and is
responsible for closing it
+ // (including the constructor's own failure path).
+ PinotDataBuffer buffer;
+ if (indexConfig.isStoreInSegmentFile()) {
+ try {
+ buffer = VectorIndexUtils.getConsolidatedVectorEntry(segmentReader,
column);
+ } catch (IOException e) {
+ throw new RuntimeException(
+ "Failed to read consolidated vector index from columns.psf for
column: " + column, e);
+ }
+ if (buffer == null) {
+ LOGGER.warn("Skipping vector index reader for column: {} because
storeInSegmentFile=true "
+ + "but no consolidated entry was found in columns.psf in
segment: {}", column, segmentDir);
+ return null;
+ }
+ } else {
+ File configuredIndexFile =
SegmentDirectoryPaths.findVectorIndexIndexFile(segmentDir, column, indexConfig);
+ if (configuredIndexFile == null || !configuredIndexFile.exists()) {
+ LOGGER.warn("Skipping vector index reader for column: {} because
configured backend {} does not have a "
+ + "matching on-disk artifact in segment: {}", column,
backendType, segmentDir);
+ return null;
+ }
+ buffer = IvfCombinedBuffers.mapCombinedFile(configuredIndexFile,
column,
+ "vector-" + backendType.name().toLowerCase());
}
+ // Buffer ownership: when reading from columns.psf, the segment
directory owns the buffer
+ // and the reader must not close it. Combined mmap buffers are owned by
the reader.
+ boolean ownsBuffer = !indexConfig.isStoreInSegmentFile();
Review Comment:
After adding IVF fallback to an on-disk sidecar, buffer ownership should be
derived from where the buffer came from (columns.psf vs mmap sidecar), not from
the config flag alone. Otherwise a sidecar-mapped buffer used as a fallback
would be treated as "borrowed" and leaked.
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/vector/VectorIndexType.java:
##########
@@ -210,25 +211,81 @@ public VectorIndexReader
createIndexReader(SegmentDirectory.Reader segmentReader
return null;
}
VectorBackendType backendType = indexConfig.resolveBackendType();
- File configuredIndexFile =
- SegmentDirectoryPaths.findVectorIndexIndexFile(segmentDir,
metadata.getColumnName(), indexConfig);
- if (configuredIndexFile == null || !configuredIndexFile.exists()) {
- LOGGER.warn("Skipping vector index reader for column: {} because
configured backend {} does not have a "
- + "matching on-disk artifact in segment: {}",
- metadata.getColumnName(), backendType, segmentDir);
- return null;
+ String column = metadata.getColumnName();
+
+ if (backendType == VectorBackendType.HNSW) {
+ // Combined form: load the HNSW index from the typed entry inside
columns.psf when one
+ // actually exists. getConsolidatedVectorEntry (unlike hasIndexFor)
does not mistake the
+ // legacy Lucene directory for a packed entry, so a segment whose
handler has not yet
+ // migrated falls through to the legacy path below instead of failing
the whole load. The
+ // buffer is owned by the segment directory — this reader must not
close it.
+ if (indexConfig.isStoreInSegmentFile()) {
+ PinotDataBuffer buffer;
+ try {
+ buffer =
VectorIndexUtils.getConsolidatedVectorEntry(segmentReader, column);
+ } catch (IOException e) {
+ throw new RuntimeException(
+ "Failed to read consolidated HNSW vector index from
columns.psf for column: " + column, e);
+ }
+ if (buffer != null) {
+ return new HnswVectorIndexReader(column, buffer,
metadata.getTotalDocs(), indexConfig);
+ }
+ LOGGER.warn("storeInSegmentFile=true but no consolidated HNSW entry
found in columns.psf for column: {} "
+ + "in segment: {}; falling back to the on-disk Lucene
directory", column, segmentDir);
+ }
+ // Legacy path: load the HNSW index from the Lucene directory on disk.
+ return new HnswVectorIndexReader(column, segmentDir,
metadata.getTotalDocs(), indexConfig);
+ }
+
+ // IVF backends accept a PinotDataBuffer; that buffer either comes from
the consolidated
+ // typed entry inside columns.psf (when storeInSegmentFile=true) or from
the legacy combined
+ // file. The chosen reader takes ownership of the buffer and is
responsible for closing it
+ // (including the constructor's own failure path).
+ PinotDataBuffer buffer;
+ if (indexConfig.isStoreInSegmentFile()) {
+ try {
+ buffer = VectorIndexUtils.getConsolidatedVectorEntry(segmentReader,
column);
+ } catch (IOException e) {
+ throw new RuntimeException(
+ "Failed to read consolidated vector index from columns.psf for
column: " + column, e);
+ }
+ if (buffer == null) {
+ LOGGER.warn("Skipping vector index reader for column: {} because
storeInSegmentFile=true "
+ + "but no consolidated entry was found in columns.psf in
segment: {}", column, segmentDir);
+ return null;
+ }
+ } else {
+ File configuredIndexFile =
SegmentDirectoryPaths.findVectorIndexIndexFile(segmentDir, column, indexConfig);
+ if (configuredIndexFile == null || !configuredIndexFile.exists()) {
+ LOGGER.warn("Skipping vector index reader for column: {} because
configured backend {} does not have a "
+ + "matching on-disk artifact in segment: {}", column,
backendType, segmentDir);
+ return null;
+ }
+ buffer = IvfCombinedBuffers.mapCombinedFile(configuredIndexFile,
column,
+ "vector-" + backendType.name().toLowerCase());
}
Review Comment:
For IVF backends, when `storeInSegmentFile=true` but the consolidated typed
entry is missing (e.g. before/after a failed absorb), the reader factory
returns null and silently disables the vector index even if a usable on-disk
artifact exists. HNSW already falls back to legacy in this situation; IVF
should do the same so the system can still use the index (and avoid unexpected
exact-scan fallback) until migration completes.
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/readers/vector/HnswVectorIndexReader.java:
##########
@@ -86,6 +91,36 @@ public HnswVectorIndexReader(String column, File indexDir,
int numDocs, VectorIn
}
}
+ /**
+ * Buffer-backed constructor: reads the HNSW index from a combined {@link
PinotDataBuffer}
+ * (the {@code LUCENE_V2} packed form produced by {@code
HnswVectorIndexCombined}).
+ *
+ * <p>The buffer is <em>not</em> owned by this reader — closing this reader
does not close the
+ * buffer. The buffer's lifetime must exceed this reader's lifetime; the
segment directory is
+ * responsible for closing it.</p>
+ *
+ * @param column column name
+ * @param indexBuffer combined buffer in LUCENE_V2 format; not owned by this
reader
+ * @param numDocs number of documents in the segment
+ * @param config vector index configuration
+ */
+ public HnswVectorIndexReader(String column, PinotDataBuffer indexBuffer, int
numDocs, VectorIndexConfig config) {
+ _column = column;
+ try {
+ _indexDirectory =
HnswVectorIndexBufferReader.createLuceneDirectory(indexBuffer, column);
+ _indexReader = DirectoryReader.open(_indexDirectory);
+ _indexSearcher = new IndexSearcher(_indexReader);
+
+ // Try to extract the mapping from the packed buffer first; build from
the Lucene index if absent.
+ PinotDataBuffer mappingBuffer =
HnswVectorIndexBufferReader.extractDocIdMappingBuffer(indexBuffer, column);
+ _docIdTranslator = new DocIdTranslator(mappingBuffer, numDocs,
_indexSearcher);
+ } catch (Exception e) {
+ LOGGER.error("Failed to instantiate buffer-backed HNSW index reader for
column {}, exception {}", column,
+ e.getMessage());
+ throw new RuntimeException(e);
+ }
Review Comment:
The new buffer-backed constructor logs only `e.getMessage()` and drops the
throwable, which loses stack traces in server logs (making segment-load
failures hard to debug). Log the exception itself via the SLF4J throwable
parameter.
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/vector/lucene99/HnswVectorIndexCombined.java:
##########
@@ -0,0 +1,347 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.segment.creator.impl.vector.lucene99;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.FileChannel;
+import java.nio.file.StandardOpenOption;
+import java.util.Map;
+import java.util.TreeMap;
+import javax.annotation.Nullable;
+import org.apache.commons.io.FileUtils;
+import
org.apache.pinot.segment.local.segment.creator.impl.text.LuceneCombinedTextIndexConstants;
+import org.apache.pinot.segment.spi.V1Constants;
+import org.apache.pinot.segment.spi.store.SegmentDirectoryPaths;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+/// Utility class to pack a Lucene HNSW index directory (and its optional
docId mapping file) into
+/// a single combined file using the {@link
LuceneCombinedTextIndexConstants#MAGIC_NUMBER LUCENE_V2}
+/// layout. Mirrors {@code LuceneTextIndexCombined} — reuses the same format
constants to keep a
+/// single on-disk format shared across text and HNSW vector indexes.
+///
+/// Layout (identical to the text-index LUCENE_V2 format):
+/// ```
+/// [Header]
+/// Magic "LUCENE_V2" 9 bytes
+/// Version 4 bytes (little-endian int)
+/// Total buffer size 8 bytes (little-endian long)
+/// File count 4 bytes (little-endian int)
+/// Reserved 4 bytes
+///
+/// [File metadata, one entry per file]
+/// Name length 2 bytes (little-endian short)
+/// Name variable
+/// File offset 8 bytes (little-endian long)
+/// File size 8 bytes (little-endian long)
+///
+/// [File data]
+/// Raw bytes of each file concatenated in metadata order
+/// ```
+public final class HnswVectorIndexCombined {
+ private static final Logger LOGGER =
LoggerFactory.getLogger(HnswVectorIndexCombined.class);
+
+ private HnswVectorIndexCombined() {
+ }
+
+ /// Packs all files in {@code hnswIndexDir} (plus the optional docId mapping
file) into a single
+ /// combined file at {@code outputFilePath}.
+ ///
+ /// @param hnswIndexDir the Lucene HNSW index directory to pack
+ /// @param outputFilePath destination path for the combined file
+ /// @param segmentIndexDir when non-null, the segment's top-level index
directory; used to
+ /// locate the docId mapping file for inclusion in
the packed output
+ /// @param column column name; used to locate the docId mapping
file when
+ /// {@code segmentIndexDir} is provided
+ /// @throws IOException if any file operations fail
+ public static void combineHnswIndexFiles(File hnswIndexDir, String
outputFilePath,
+ @Nullable File segmentIndexDir, @Nullable String column)
+ throws IOException {
+ if (!hnswIndexDir.exists() || !hnswIndexDir.isDirectory()) {
+ throw new IllegalArgumentException(
+ "HNSW index directory does not exist or is not a directory: " +
hnswIndexDir);
+ }
+
+ LOGGER.info("Combining HNSW index files from directory: {}",
hnswIndexDir.getAbsolutePath());
+
+ Map<String, FileInfo> fileInfoMap = collectFiles(hnswIndexDir,
segmentIndexDir, column);
+ int fileCount = fileInfoMap.size();
+
+ if (fileCount == 0) {
+ throw new IOException("No files found in HNSW index directory: " +
hnswIndexDir);
+ }
+
+ long totalSize = calculateTotalBufferSize(fileInfoMap);
+ if (totalSize > Integer.MAX_VALUE) {
+ throw new IOException("Combined HNSW index size too large: " + totalSize
+ " bytes");
+ }
+
+ File outputFile = new File(outputFilePath);
+ // TRUNCATE_EXISTING so a leftover (larger) file from a previously-crashed
pack does not leave
+ // stale trailing bytes past the new payload — that would inflate the file
length and break the
+ // size-based crash-recovery check in VectorIndexHandler that compares the
combined file length
+ // to the columns.psf typed-entry size.
+ try (FileChannel outputChannel = FileChannel.open(outputFile.toPath(),
StandardOpenOption.CREATE,
+ StandardOpenOption.WRITE, StandardOpenOption.TRUNCATE_EXISTING)) {
+ writeHeader(outputChannel, fileCount, (int) totalSize);
+ long dataOffset = LuceneCombinedTextIndexConstants.getHeaderSize() +
calculateMetadataSize(fileInfoMap);
+ writeFileMetadata(outputChannel, fileInfoMap, dataOffset);
+ writeFileData(outputChannel, fileInfoMap);
+ }
+
+ LOGGER.info("Combined {} HNSW index files into: {} ({} bytes)", fileCount,
outputFilePath, totalSize);
+ }
+
+ /// Collects all regular files from {@code hnswIndexDir} and optionally the
docId mapping file
+ /// from the segment's flat directory. Uses a {@link TreeMap} for
deterministic ordering.
+ private static Map<String, FileInfo> collectFiles(File hnswIndexDir,
@Nullable File segmentIndexDir,
+ @Nullable String column)
+ throws IOException {
+ Map<String, FileInfo> fileInfoMap = new TreeMap<>();
+
+ File[] files = hnswIndexDir.listFiles();
+ if (files != null) {
+ for (File file : files) {
+ if (file.isFile()) {
+ fileInfoMap.put(file.getName(), new FileInfo(file, file.getName(),
file.length()));
+ }
+ }
+ }
+
+ // Include the docId mapping file when available. It lives beside the HNSW
directory, not
+ // inside it (same convention as text-index).
+ if (segmentIndexDir != null && column != null) {
+ File segmentDir =
SegmentDirectoryPaths.findSegmentDirectory(segmentIndexDir);
+ File mappingFile = new File(segmentDir,
+ column +
V1Constants.Indexes.VECTOR_HNSW_INDEX_DOCID_MAPPING_FILE_EXTENSION);
+ if (mappingFile.exists() && mappingFile.isFile()) {
+ fileInfoMap.put(mappingFile.getName(), new FileInfo(mappingFile,
mappingFile.getName(), mappingFile.length()));
+ LOGGER.info("Including docId mapping file: {} ({} bytes)",
mappingFile.getName(), mappingFile.length());
+ }
+ }
+
+ return fileInfoMap;
+ }
+
+ /// Extracts files packed inside a combined HNSW file back into a Lucene
directory.
+ ///
+ /// This is the inverse of {@link #combineHnswIndexFiles}. The docId mapping
file (if present)
+ /// is extracted into {@code targetDir} alongside the Lucene index files;
the caller is
+ /// responsible for moving it to its canonical location if needed.
+ ///
+ /// @param combinedFile source combined file (LUCENE_V2 layout)
+ /// @param targetDir destination directory; created if absent
+ /// @throws IOException if any file operations fail
+ public static void extractHnswIndexFiles(File combinedFile, File targetDir)
+ throws IOException {
+ if (!combinedFile.exists() || !combinedFile.isFile()) {
+ throw new IllegalArgumentException("Combined file does not exist or is
not a file: " + combinedFile);
+ }
+ FileUtils.forceMkdir(targetDir);
+
+ try (FileChannel inputChannel = FileChannel.open(combinedFile.toPath(),
StandardOpenOption.READ)) {
+ // Parse header
+ byte[] magicBytes = new
byte[LuceneCombinedTextIndexConstants.MAGIC_NUMBER_LENGTH];
+ readFully(inputChannel, ByteBuffer.wrap(magicBytes));
+ String magic = new String(magicBytes);
+ if (!LuceneCombinedTextIndexConstants.MAGIC_NUMBER.equals(magic)) {
+ throw new IOException("Invalid magic number in combined HNSW file: " +
magic);
+ }
+
+ ByteBuffer intBuf =
ByteBuffer.allocate(Integer.BYTES).order(ByteOrder.LITTLE_ENDIAN);
+ readFully(inputChannel, intBuf);
+ intBuf.flip();
+ int version = intBuf.getInt();
+ if (version != LuceneCombinedTextIndexConstants.VERSION) {
+ throw new IOException("Unsupported version in combined HNSW file: " +
version);
+ }
+
+ // Skip total size (8 bytes) and file count (4 bytes) header fields
+ ByteBuffer longBuf =
ByteBuffer.allocate(Long.BYTES).order(ByteOrder.LITTLE_ENDIAN);
+ readFully(inputChannel, longBuf); // totalSize
+ longBuf.flip();
+ // (unused, but we advance the channel position past it)
+
+ intBuf =
ByteBuffer.allocate(Integer.BYTES).order(ByteOrder.LITTLE_ENDIAN);
+ readFully(inputChannel, intBuf);
+ intBuf.flip();
+ int fileCount = intBuf.getInt();
+
+ // Skip reserved field
+ intBuf =
ByteBuffer.allocate(Integer.BYTES).order(ByteOrder.LITTLE_ENDIAN);
+ readFully(inputChannel, intBuf);
+
+ // Parse file metadata
+ String[] fileNames = new String[fileCount];
+ long[] fileOffsets = new long[fileCount];
+ long[] fileSizes = new long[fileCount];
+ for (int i = 0; i < fileCount; i++) {
+ ByteBuffer shortBuf =
ByteBuffer.allocate(Short.BYTES).order(ByteOrder.LITTLE_ENDIAN);
+ readFully(inputChannel, shortBuf);
+ shortBuf.flip();
+ short nameLength = shortBuf.getShort();
+
+ byte[] nameBytes = new byte[nameLength];
+ readFully(inputChannel, ByteBuffer.wrap(nameBytes));
+ fileNames[i] = new String(nameBytes);
+
+ longBuf =
ByteBuffer.allocate(Long.BYTES).order(ByteOrder.LITTLE_ENDIAN);
+ readFully(inputChannel, longBuf);
+ longBuf.flip();
+ fileOffsets[i] = longBuf.getLong();
+
+ longBuf =
ByteBuffer.allocate(Long.BYTES).order(ByteOrder.LITTLE_ENDIAN);
+ readFully(inputChannel, longBuf);
+ longBuf.flip();
+ fileSizes[i] = longBuf.getLong();
+ }
+
+ // Extract each file by seeking to its offset and copying bytes
+ for (int i = 0; i < fileCount; i++) {
+ File outFile = new File(targetDir, fileNames[i]);
+ long fileSize = fileSizes[i];
+ try (FileChannel outChannel = FileChannel.open(outFile.toPath(),
StandardOpenOption.CREATE,
+ StandardOpenOption.WRITE)) {
+ long remaining = fileSize;
+ long srcOffset = fileOffsets[i];
+ while (remaining > 0) {
+ long transferred = inputChannel.transferTo(srcOffset, remaining,
outChannel);
+ srcOffset += transferred;
+ remaining -= transferred;
+ }
Review Comment:
`extractHnswIndexFiles` loops on `FileChannel.transferTo` without a progress
check. If the combined file is corrupt (bad offsets) or `transferTo` returns 0,
this can become an infinite loop during segment load/migration.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]