[
https://issues.apache.org/jira/browse/BEAM-1893?focusedWorklogId=219956&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-219956
]
ASF GitHub Bot logged work on BEAM-1893:
----------------------------------------
Author: ASF GitHub Bot
Created on: 28/Mar/19 12:28
Start Date: 28/Mar/19 12:28
Worklog Time Spent: 10m
Work Description: iemejia commented on issue #8152:
[DoNotMerge][BEAM-1893] Implementation of CouchbaseIO
URL: https://github.com/apache/beam/pull/8152#issuecomment-477574205
I would be surprised if there is not a way to estimate the size of the
collection in Couchbase. For example
[couch_total_disk_size](https://docs.couchbase.com/server/6.0/rest-api/rest-bucket-stats.html)?
or something like that.
I am definitely pro changing it to a ParDo based approach in any case. I
just thought that the exercise to write the IO based on the `BoundedSource` API
was worth, and thought about doing this next, but why not. Remember however
that partitioning should be properly done in this case too (You can take a look
at SolrIO also for an example).
FInal note if you reimplement it based in ParDo probably is worth to
implement `readAll` as a base of read, for an example take a look at
`HBaseIO.readAll`. The idea is to pass a collection of `queries` to make it
composable with only ParDos and with this produce the transform. Notice that
HBaseAll uses annotations the new IO API (SplittableDoFn) PLEASE DON'T USE
THOSE!. However produce the splits with a `RestrictionTracker` (ByteKey or
OffsetRange or whatever they use to represent Ranges in Couchbase) this is a
must for future evolution towards SDF. If you have doubts don't hesitate to ask
me @EdgarLGB .
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 219956)
Time Spent: 1h 20m (was: 1h 10m)
> Add IO module for Couchbase
> ---------------------------
>
> Key: BEAM-1893
> URL: https://issues.apache.org/jira/browse/BEAM-1893
> Project: Beam
> Issue Type: New Feature
> Components: io-ideas
> Reporter: Xu Mingmin
> Assignee: LI Guobao
> Priority: Major
> Labels: Couchbase, IO, features, triaged
> Time Spent: 1h 20m
> Remaining Estimate: 0h
>
> Create a {{CouchbaseIO}} for Couchbase database.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)