pabloem commented on pull request #15848: URL: https://github.com/apache/beam/pull/15848#issuecomment-1029488107
regarding your comment in https://github.com/apache/beam/pull/15848#discussion_r798790351 yeah, SDF is interesting here, and I am not 100% sure how to approach it - but yes, we need ordering, so the implementation would be something like: ``` resultset = query.execute(); while(true) { resultset.next(); if(!tryClaim(resultSet.get(key))) { return DONE; } c.output(format(resultSet)); } ``` and the split would be something like: ``` tryClaim(key) { this.latestKey = key } trySplit() { ranges = generateRanges(2, latestKey, endOfRange) // Generate two ranges between the current key and the last key return new SplitResult(ranges.currentRange, ranges.nextRange); } ``` The issue is that the query with the full range would be executing in the database, so we may duplicate work on the database - so I'm not sure how bad this is or not : ) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
