[GitHub] [arrow] davisusanibar commented on a diff in pull request #14382: ARROW-17789: [Java][Docs] Update Java Dataset documentation with latest changes

GitBox Wed, 12 Oct 2022 13:11:49 -0700


davisusanibar commented on code in PR #14382:
URL: https://github.com/apache/arrow/pull/14382#discussion_r993854080



##########
docs/source/java/dataset.rst:
##########
@@ -32,31 +32,50 @@ is not designed only for querying files but can be extended 
to serve all
 possible data sources such as from inter-process communication or from other
 network locations, etc.
 
+.. contents::
+
 Getting Started
 ===============
 
+Currently supported file formats are:
+
+- Apache Arrow (`.arrow`)

Review Comment:
   Changed



##########
docs/source/java/dataset.rst:
##########
@@ -32,31 +32,50 @@ is not designed only for querying files but can be extended 
to serve all
 possible data sources such as from inter-process communication or from other
 network locations, etc.
 
+.. contents::
+
 Getting Started
 ===============
 
+Currently supported file formats are:
+
+- Apache Arrow (`.arrow`)
+- Apache ORC (`.orc`)
+- Apache Parquet (`.parquet`)
+- Comma-Separated Values (`.csv`)
+
 Below shows a simplest example of using Dataset to query a Parquet file in 
Java:
 
 .. code-block:: Java
 
     // read data from file /opt/example.parquet
     String uri = "file:/opt/example.parquet";
-    BufferAllocator allocator = new RootAllocator(Long.MAX_VALUE);
-    DatasetFactory factory = new FileSystemDatasetFactory(allocator,
-        NativeMemoryPool.getDefault(), FileFormat.PARQUET, uri);
-    Dataset dataset = factory.finish();
-    Scanner scanner = dataset.newScan(new ScanOptions(100)));
-    List<ArrowRecordBatch> batches = StreamSupport.stream(
-        scanner.scan().spliterator(), false)
-            .flatMap(t -> stream(t.execute()))
-            .collect(Collectors.toList());
-
-    // do something with read record batches, for example:
-    analyzeArrowData(batches);
-
-    // finished the analysis of the data, close all resources:
-    AutoCloseables.close(batches);
-    AutoCloseables.close(factory, dataset, scanner);
+    ScanOptions options = new ScanOptions(/*batchSize*/ 5);

Review Comment:
   Changed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] davisusanibar commented on a diff in pull request #14382: ARROW-17789: [Java][Docs] Update Java Dataset documentation with latest changes

Reply via email to