recordBatch

GitBox Sun, 03 Jan 2021 10:48:30 -0800


Dandandan commented on a change in pull request #9084:
URL: https://github.com/apache/arrow/pull/9084#discussion_r551041037




##########
File path: rust/arrow/src/csv/reader.rs
##########
@@ -394,88 +393,116 @@ impl<R: Read> Iterator for Reader<R> {
     }
 }
 
-/// parses a slice of [csv_crate::StringRecord] into a 
[array::record_batch::RecordBatch].
-fn parse(
+/// Tries to create an [array::Array] from a slice of 
[csv_crate::StringRecord] by interpreting its
+/// values at column `column_index` to be of `data_type`.
+/// `line_number` is where the set of rows starts at, and is only used to 
report the line number in case of errors.
+/// # Error
+/// This function errors iff:
+/// * _any_ entry from `rows` at `column_index` cannot be parsed into the 
DataType.
+/// * The [array::datatypes::DataType] is not supported.
+pub fn build_array(
+    rows: &[StringRecord],
+    data_type: &DataType,
+    line_number: usize,
+    column_index: usize,
+) -> Result<ArrayRef> {
+    match data_type {
+        DataType::Boolean => build_boolean_array(line_number, rows, 
column_index),
+        DataType::Int8 => {
+            build_primitive_array::<Int8Type>(line_number, rows, column_index)
+        }
+        DataType::Int16 => {
+            build_primitive_array::<Int16Type>(line_number, rows, column_index)
+        }
+        DataType::Int32 => {
+            build_primitive_array::<Int32Type>(line_number, rows, column_index)
+        }
+        DataType::Int64 => {
+            build_primitive_array::<Int64Type>(line_number, rows, column_index)
+        }
+        DataType::UInt8 => {
+            build_primitive_array::<UInt8Type>(line_number, rows, column_index)
+        }
+        DataType::UInt16 => {
+            build_primitive_array::<UInt16Type>(line_number, rows, 
column_index)
+        }
+        DataType::UInt32 => {
+            build_primitive_array::<UInt32Type>(line_number, rows, 
column_index)
+        }
+        DataType::UInt64 => {
+            build_primitive_array::<UInt64Type>(line_number, rows, 
column_index)
+        }
+        DataType::Float32 => {
+            build_primitive_array::<Float32Type>(line_number, rows, 
column_index)
+        }
+        DataType::Float64 => {
+            build_primitive_array::<Float64Type>(line_number, rows, 
column_index)
+        }
+        DataType::Date32(_) => {
+            build_primitive_array::<Date32Type>(line_number, rows, 
column_index)
+        }
+        DataType::Date64(_) => {
+            build_primitive_array::<Date64Type>(line_number, rows, 
column_index)
+        }
+        DataType::Timestamp(TimeUnit::Microsecond, _) => {
+            build_primitive_array::<TimestampMicrosecondType>(
+                line_number,
+                rows,
+                column_index,
+            )
+        }
+        DataType::Timestamp(TimeUnit::Nanosecond, _) => 
build_primitive_array::<
+            TimestampNanosecondType,
+        >(
+            line_number, rows, column_index
+        ),
+        DataType::Utf8 => Ok(Arc::new(
+            rows.iter()
+                .map(|row| row.get(column_index))
+                .collect::<StringArray>(),
+        ) as ArrayRef),
+        other => Err(ArrowError::ParseError(format!(
+            "Unsupported data type {:?}",
+            other
+        ))),
+    }
+}
+
+/// Tries to create an [array::record_batch::RecordBatch] from a slice of 
[csv_crate::StringRecord] by interpreting
+/// each of its columns according to `fields`. When `projection` is not None, 
it is used to select a subset of `fields` to
+/// parse.
+/// `line_number` is where the set of rows starts at, and is only used to 
report the line number in case of errors.
+/// # Error
+/// This function errors iff:
+/// * _any_ entry from `rows` cannot be parsed into its corresponding field's 
`DataType`.
+/// * Any of the fields' [array::datatypes::DataType] is not supported.
+/// # Panic
+/// This function panics if any index in `projection` is larger than 
`fields.len()`.
+pub fn build_batch(
     rows: &[StringRecord],
     fields: &[Field],
     projection: &Option<Vec<usize>>,
     line_number: usize,
 ) -> Result<RecordBatch> {
     let projection: Vec<usize> = match projection {
         Some(ref v) => v.clone(),
-        None => fields.iter().enumerate().map(|(i, _)| i).collect(),
+        None => (0..fields.len()).collect(),

Review comment:
       👍 I think the `v.clone()` could maybe even be removed?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] Dandandan commented on a change in pull request #9084: ARROW-11119: [Rust] Expose functions to parse a single CSV column / StringRecord into an array / recordBatch

Reply via email to