rtpsw commented on code in PR #34311:
URL: https://github.com/apache/arrow/pull/34311#discussion_r1127068282
##########
cpp/src/arrow/compute/row/grouper.h:
##########
@@ -30,6 +30,75 @@
namespace arrow {
namespace compute {
+/// \brief A segment
+/// A segment group is a chunk of continous rows that have the same segment
key. (For
+/// example, in ordered time series processing, segment key can be "date", and
a segment
+/// group can be all the rows that belong to the same date.) A segment group
can span
+/// across multiple exec batches. A segment is a chunk of continous rows that
has the same
+/// segment key within a given batch. When a segment group span cross batches,
it will
+/// have multiple segments. A segment never spans cross batches. The segment
data
+/// structure only makes sense when used along with a exec batch.
+struct ARROW_EXPORT Segment {
+ /// \brief the offset into the batch where the segment starts
+ int64_t offset;
+ /// \brief the length of the segment
+ int64_t length;
+ /// \brief whether the segment may be extended by a next one
+ bool is_open;
+ /// \brief whether the segment extends a preceeding one
+ bool extends;
+};
+
+inline bool operator==(const Segment& segment1, const Segment& segment2) {
Review Comment:
What would be the advantage for such a simple class? In any case, this seems
like a matter of conventions in Arrow. I'll defer to @westonpace.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]