[GitHub] [arrow] eerhardt commented on a change in pull request #12068: ARROW-15037: [C#] A stream processing example of IoT sensor data

GitBox Wed, 19 Jan 2022 08:19:08 -0800


eerhardt commented on a change in pull request #12068:
URL: https://github.com/apache/arrow/pull/12068#discussion_r787918349




##########
File path: csharp/examples/IoTDataPipelineExample/Program.cs
##########
@@ -75,14 +75,23 @@ public static async Task Main(string[] args)
                                     + recordBatch.Column(j).Data.NullCount);
                             }
 
-                            if (recordBatch.Schema.HasMetadata && 
recordBatch.Schema.Metadata.TryGetValue("SubjectId", out string subjectId))
+                            var col = (Int32Array)recordBatch.Column(0);
+                            var subjectId = col.Values[0].ToString();
+
+                            if (!recordBatchDict.ContainsKey(subjectId))
                             {
-                                if (!recordBatchDict.ContainsKey(subjectId))
-                                {
-                                    recordBatchDict.Add(subjectId, new 
List<RecordBatch>());
-                                }
-                                recordBatchDict[subjectId].Add(recordBatch);
+                                recordBatchDict.Add(subjectId, new 
List<RecordBatch>());
                             }
+                            recordBatchDict[subjectId].Add(recordBatch);
+
+                            //if (recordBatch.Schema.HasMetadata && 
recordBatch.Schema.Metadata.TryGetValue("SubjectId", out string subjectId))

Review comment:
       If you look at 
https://arrow.apache.org/docs/format/Columnar.html#ipc-streaming-format, you 
see that the SCHEMA is stored at the beginning of the file (in the "STREAMING 
FORMAT" section) outside of each RECORD BATCH. Thus every record batch in a 
file needs to have the same schema.
   
   But your code is trying to add metadata to the schema that is specific for 
each record batch. In order to accomplish this, you would need a different file 
/ arrow stream for every record batch that had different subject IDs.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] eerhardt commented on a change in pull request #12068: ARROW-15037: [C#] A stream processing example of IoT sensor data

Reply via email to