Hi tanghaodong25, I see you're planning to support storage and Lance formats. I
previously contributed to the Lance community through my GitHub account,
kaori-seasons. Here's a basic Lance usage guide:
pr: https://github.com/apache/geaflow/pull/637Lance Format Read/Write
DemonstrationLance provides a Java binding based on JNI, mainly utilizing the
LanceFileReader and LanceFileWriter classes to implement file-level read and
write operations.Complete Demonstration Code```import
com.lancedb.lance.file.LanceFileReader;
import com.lancedb.lance.file.LanceFileWriter;
import org.apache.arrow.memory.BufferAllocator;
import org.apache.arrow.memory.RootAllocator;
import org.apache.arrow.vector.BigIntVector;
import org.apache.arrow.vector.VarCharVector;
import org.apache.arrow.vector.VectorSchemaRoot;
import org.apache.arrow.vector.dictionary.DictionaryProvider;
import org.apache.arrow.vector.ipc.ArrowReader;
import org.apache.arrow.vector.types.pojo.ArrowType;
import org.apache.arrow.vector.types.pojo.Field;
import org.apache.arrow.vector.types.pojo.Schema;
import org.apache.arrow.vector.util.Text;
import java.io.IOException;
import java.util.Arrays;
import java.util.Collections;
public class LanceFileDemo {
public static void main(String[] args) throws Exception {
String filePath = "/tmp/demo.lance";
// Write to Lance file
writeToLanceFile(filePath);
// Read from Lance file
readFromLanceFile(filePath);
}
private static void writeToLanceFile(String filePath) throws IOException {
try (BufferAllocator allocator = new RootAllocator()) {
// Create Schema
Schema schema = new Schema(Arrays.asList(
Field.nullable("id", new ArrowType.Int(64, true)),
Field.nullable("name", new ArrowType.Utf8())
));
// Create data
try (VectorSchemaRoot batch = VectorSchemaRoot.create(schema,
allocator)) {
BigIntVector idVector = (BigIntVector) batch.getVector("id");
VarCharVector nameVector = (VarCharVector)
batch.getVector("name");
// Populate data
batch.setRowCount(3);
idVector.setSafe(0, 1L);
idVector.setSafe(1, 2L);
idVector.setSafe(2, 3L);
nameVector.setSafe(0, new Text("Alice"));
nameVector.setSafe(1, new Text("Bob"));
nameVector.setSafe(2, new Text("Charlie"));
// Write to file
try (LanceFileWriter writer = LanceFileWriter.open(
filePath,
allocator,
DictionaryProvider.EMPTY
)) {
writer.write(batch);
}
}
}
System.out.println("Data has been written to Lance file: " + filePath);
}
private static void readFromLanceFile(String filePath) throws Exception {
try (BufferAllocator allocator = new RootAllocator()) {
try (LanceFileReader reader = LanceFileReader.open(filePath,
allocator)) {
System.out.println("File number of rows: " + reader.numRows());
System.out.println("File Schema: " + reader.getSchema());
// Read all data
try (ArrowReader arrowReader = reader.readAll(1024, null,
null)) {
while (arrowReader.loadNextBatch()) {
VectorSchemaRoot root =
arrowReader.getVectorSchemaRoot();
System.out.println("Read " + root.getRowCount() + "
rows of data");
// Print data
for (int i = 0; i < root.getRowCount(); i++) {
BigIntVector idVector = (BigIntVector)
root.getVector("id");
VarCharVector nameVector = (VarCharVector)
root.getVector("name");
System.out.println("Row " + i + ": id=" +
idVector.get(i) + ", name=" +
nameVector.getObject(i));
}
}
}
}
}
}
}```Key Component DescriptionsLanceFileWriter: Used for writing Lance format
files.LanceFileReader: Used for reading Lance format files.JNI Implementation:
The underlying implementation is in Rust, with Java calling it via JNI.Maven
Dependency```<dependency>
<groupId>com.lancedb</groupId>
<artifactId>lance-core</artifactId>
<version>0.18.0</version>
</dependency>```