[
https://issues.apache.org/jira/browse/BEAM-1893?focusedWorklogId=219957&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-219957
]
ASF GitHub Bot logged work on BEAM-1893:
----------------------------------------
Author: ASF GitHub Bot
Created on: 28/Mar/19 12:30
Start Date: 28/Mar/19 12:30
Worklog Time Spent: 10m
Work Description: iemejia commented on issue #8152:
[DoNotMerge][BEAM-1893] Implementation of CouchbaseIO
URL: https://github.com/apache/beam/pull/8152#issuecomment-477574205
I would be surprised if there is not a way to estimate the size of the
collection in Couchbase. For example
[couch_total_disk_size](https://docs.couchbase.com/server/6.0/rest-api/rest-bucket-stats.html)?
or something like that.
I am definitely pro changing it to a ParDo based approach in any case. I
just thought that the exercise to write the IO based on the `BoundedSource` API
was worth, and thought about doing this next, but why not. Remember however
that partitioning should be properly done in this case too (You can take a look
at SolrIO also for an example).
For the ParDo based implementation is probably a good idea to implement
`readAll` first as a basis for read, for an example take a look at
`HBaseIO.readAll`. The idea is to pass a `PCollection` of `queries` to make it
composable with only ParDos and with this produce the transform. Notice that
HBaseAll uses annotations of the new IO API (SplittableDoFn) (PLEASE DON'T USE
THOSE because they are not supported for all runners!. However produce the
splits with a `RestrictionTracker` (ByteKey or OffsetRange or whatever they use
to represent Ranges in Couchbase) this will help us to evolve in the future
towards SDF if there is a need too. If you have doubts don't hesitate to ask me
@EdgarLGB
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 219957)
Time Spent: 1.5h (was: 1h 20m)
> Add IO module for Couchbase
> ---------------------------
>
> Key: BEAM-1893
> URL: https://issues.apache.org/jira/browse/BEAM-1893
> Project: Beam
> Issue Type: New Feature
> Components: io-ideas
> Reporter: Xu Mingmin
> Assignee: LI Guobao
> Priority: Major
> Labels: Couchbase, IO, features, triaged
> Time Spent: 1.5h
> Remaining Estimate: 0h
>
> Create a {{CouchbaseIO}} for Couchbase database.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)