davisusanibar commented on code in PR #14382:
URL: https://github.com/apache/arrow/pull/14382#discussion_r993854080


##########
docs/source/java/dataset.rst:
##########
@@ -32,31 +32,50 @@ is not designed only for querying files but can be extended 
to serve all
 possible data sources such as from inter-process communication or from other
 network locations, etc.
 
+.. contents::
+
 Getting Started
 ===============
 
+Currently supported file formats are:
+
+- Apache Arrow (`.arrow`)

Review Comment:
   Changed



##########
docs/source/java/dataset.rst:
##########
@@ -32,31 +32,50 @@ is not designed only for querying files but can be extended 
to serve all
 possible data sources such as from inter-process communication or from other
 network locations, etc.
 
+.. contents::
+
 Getting Started
 ===============
 
+Currently supported file formats are:
+
+- Apache Arrow (`.arrow`)
+- Apache ORC (`.orc`)
+- Apache Parquet (`.parquet`)
+- Comma-Separated Values (`.csv`)
+
 Below shows a simplest example of using Dataset to query a Parquet file in 
Java:
 
 .. code-block:: Java
 
     // read data from file /opt/example.parquet
     String uri = "file:/opt/example.parquet";
-    BufferAllocator allocator = new RootAllocator(Long.MAX_VALUE);
-    DatasetFactory factory = new FileSystemDatasetFactory(allocator,
-        NativeMemoryPool.getDefault(), FileFormat.PARQUET, uri);
-    Dataset dataset = factory.finish();
-    Scanner scanner = dataset.newScan(new ScanOptions(100)));
-    List<ArrowRecordBatch> batches = StreamSupport.stream(
-        scanner.scan().spliterator(), false)
-            .flatMap(t -> stream(t.execute()))
-            .collect(Collectors.toList());
-
-    // do something with read record batches, for example:
-    analyzeArrowData(batches);
-
-    // finished the analysis of the data, close all resources:
-    AutoCloseables.close(batches);
-    AutoCloseables.close(factory, dataset, scanner);
+    ScanOptions options = new ScanOptions(/*batchSize*/ 5);

Review Comment:
   Changed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to