Thomas Tauber-Marshall has posted comments on this change. Change subject: Initial structure for the C++ hiveserver2 client. ......................................................................
Patch Set 12: (17 comments) http://gerrit.cloudera.org:8080/#/c/2645/12/src/hs2client/columnar-row-set.h File src/hs2client/columnar-row-set.h: Line 54: virtual const uint8_t* nulls() const = 0; > let's stick with impala naming conventions What convention? Line 62: TypedColumn(const uint8_t* nulls, const std::vector<T>* data) > does this need to be public? Done http://gerrit.cloudera.org:8080/#/c/2645/12/src/hs2client/fetch-results.h File src/hs2client/fetch-results.h: Line 25: // Row oriented results are not currently supported. > do we plan on supporting this in the future? (and if so, why?) Its not a priority, but we'd like to leave ourselves the ability to support it in the future, as some workloads will be more efficient will row oriented results, and we're trying to keep this as general and close to the hs2 spec as possible in case we want to use this as a client for other non-impala hs2 services in the future. http://gerrit.cloudera.org:8080/#/c/2645/12/src/hs2client/hs2service.h File src/hs2client/hs2service.h: Line 65: // HS2Service objects are created using HS2Service::Connect. They must > function references end in () Done Line 76: class HS2Service { > what was the rationale for prefixing this with 'HS2' (and session) but not I did it this way because that's how it is in Impyla, but of course this is our opportunity to correct mistakes from Impyla, and you make a good point, so I'll remove the HS2 and just rely on the namespace for disambiguation. Line 109: Status OpenSession(const std::string& user, const HS2ClientConfig& config, > is 'user' sufficient as an authentication mechanism? The authentication happens in Service::Connect. This brings up the good point, though, that the definition of Connect above doesn't really make any sense, since if use_ssl was true you would also need to specify a username/password/protocol, so I'll remove use_ssl and when we eventually add authentication support we'll add another Connect function that takes the parameters needed for that. http://gerrit.cloudera.org:8080/#/c/2645/12/src/hs2client/hs2session.h File src/hs2client/hs2session.h: Line 48: std::shared_ptr<Operation>* operation) const; > the class comment for Operation says it's not thread-safe, so why return th Part of the original thinking was that Operations will be passed around a lot by client code, and its easier to do that if they're in shared_ptrs. I agree that its questionable, though, especially since we're using unique_ptrs everywhere else. I'd like to confirm with Wes before making this change, though. Line 55: Status GetSchemas(std::shared_ptr<Operation>* operation) const; > what exactly does this do? (what is role of operation?) The Operation gives us access to its state, such as the log and profile. Much of this is uninteresting or not populated at all for these metadata operations as Impala is currently written, but we may someday want to change that. It also gives you more flexibility, eg. you may want to fetch the results in batches of a certain size for something like GetColumns that might return a large number of results, or you may want to fetch the results in a different order, such as refetching the prior batch of result, once Impala supports different fetch orientations. I may be wrong that anyone would ever want to do those things, but it doesn't seem to hurt too much to leave the option there. Line 59: Status GetTables(std::shared_ptr<Operation>* operation) const; > instead of returning the results of these metadata requests as generic oper As above, I'm not convinced that giving users access to Operation objects for these calls isn't useful (or at the very least, more flexible and future-proof), but you're correct that we could do more here to make this easy for users. I filed an issue for this: https://github.com/cloudera/hs2client/issues/8 http://gerrit.cloudera.org:8080/#/c/2645/12/src/hs2client/operation.h File src/hs2client/operation.h: Line 40: enum class OperationState { > move into class Operation and drop 'Operation' from name? Done Line 41: INITIALIZED_STATE, > embedding the type name in the enum constants is common practice in c to di Done Line 52: // and to retrieve its results. > how do you get result set metadata? (the GetResultSetMetadata rpc) Coming in a follow up review. Line 73: // Fetches a batch of results, stores them in 'results', and sets has_more_rows. > you should point out in the function comments whether a function blocks. i' Done Line 104: // True if this operation has begun execution and has not been closed yet. > how does this related to OperationState? If it is false, that either means that the operation was never successfully created or that it has been closed. Either way, its awkward to rely on calling GetState to get this information, as Impala won't ever actually return a CLOSED state - in both situations Impala won't have any information about the operation so what we'll actually get back is a "invalid query handle" error. The comment is a little misleading, as its not clear if "begun execution" applies when the operation is in state INITIALIZED, so I updated it to be clearer. http://gerrit.cloudera.org:8080/#/c/2645/12/src/hs2client/sample-usage.cc File src/hs2client/sample-usage.cc: Line 85: total_retrieved += int_col->length(); > it feels like the number of rows should also be part of FetchResults The problem with doing that is that it would require us to know what the type of at least one of the columns is when the FetchResult is created in Operation::Fetch so that we can extract the column data from the thrift object. The only way to do that would be to also make a call to GetResultSetMetadata, which seems like overhead that we wouldn't want to add to every call to Fetch, or to require the user to specify the schema when creating the operation or calling Fetch, which seems like an unnecessary complication. Line 87: int32_t int_val = int_col->data()[i]; > why not also add a GetData(int), so you don't always have to deal with the Done http://gerrit.cloudera.org:8080/#/c/2645/12/src/hs2client/util.cc File src/hs2client/util.cc: Line 101: hs2::TGetResultSetMetadataReq req; > i think this part of hs2 functionality should be exposed in a consumable wa Yes, we're working on a follow up patch with this functionality, we're just trying not to keep adding things to this review so that we can actually get it finalized and in so that Wes can start writing the python wrapper. -- To view, visit http://gerrit.cloudera.org:8080/2645 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7be82d5897ad6093116915b17924674c4ae508a0 Gerrit-PatchSet: 12 Gerrit-Project: hs2client Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Matthew Jacobs <[email protected]> Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Wes McKinney <[email protected]> Gerrit-HasComments: Yes
