eerhardt commented on a change in pull request #12068:
URL: https://github.com/apache/arrow/pull/12068#discussion_r787918349
##########
File path: csharp/examples/IoTDataPipelineExample/Program.cs
##########
@@ -75,14 +75,23 @@ public static async Task Main(string[] args)
+ recordBatch.Column(j).Data.NullCount);
}
- if (recordBatch.Schema.HasMetadata &&
recordBatch.Schema.Metadata.TryGetValue("SubjectId", out string subjectId))
+ var col = (Int32Array)recordBatch.Column(0);
+ var subjectId = col.Values[0].ToString();
+
+ if (!recordBatchDict.ContainsKey(subjectId))
{
- if (!recordBatchDict.ContainsKey(subjectId))
- {
- recordBatchDict.Add(subjectId, new
List<RecordBatch>());
- }
- recordBatchDict[subjectId].Add(recordBatch);
+ recordBatchDict.Add(subjectId, new
List<RecordBatch>());
}
+ recordBatchDict[subjectId].Add(recordBatch);
+
+ //if (recordBatch.Schema.HasMetadata &&
recordBatch.Schema.Metadata.TryGetValue("SubjectId", out string subjectId))
Review comment:
If you look at
https://arrow.apache.org/docs/format/Columnar.html#ipc-streaming-format, you
see that the SCHEMA is stored at the beginning of the file (in the "STREAMING
FORMAT" section) outside of each RECORD BATCH. Thus every record batch in a
file needs to have the same schema.
But your code is trying to add metadata to the schema that is specific for
each record batch. In order to accomplish this, you would need a different file
/ arrow stream for every record batch that had different subject IDs.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]