> On April 20, 2018, 7:44 p.m., Eugene Koifman wrote: > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java > > Lines 2499 (patched) > > <https://reviews.apache.org/r/66645/diff/4/?file=2006476#file2006476line2499> > > > > Heartbeat for each txn should be sent at ConfVars.HIVE_TXN_TIMEOUT/2 > > intervals. HIVE_TXN_TIMEOUT is what the server uses to timeout a txn. > > Sending heartbets more frequently loads the metastore unnecessarily. See > > DbTxnManger.getHeartbeatInterval() for example. This also guarantees that > > server and client are in sync.
Ok. Will fix it. Will make heartbeat interval default to txn_timeout/2. > On April 20, 2018, 7:44 p.m., Eugene Koifman wrote: > > streaming/src/java/org/apache/hive/streaming/ConnectionInfo.java > > Lines 54 (patched) > > <https://reviews.apache.org/r/66645/diff/4/?file=2006484#file2006484line54> > > > > Why is it useful to make this distinction? Since the imple is able to > > route all events to appropriate partition, why should it not do that all > > the time. This avoids baking the static partition in the record. Static partitions take fast path in which it does not decode the row to extract partition values out. > On April 20, 2018, 7:44 p.m., Eugene Koifman wrote: > > streaming/src/java/org/apache/hive/streaming/ConnectionInfo.java > > Lines 68 (patched) > > <https://reviews.apache.org/r/66645/diff/4/?file=2006484#file2006484line68> > > > > if isStaticPartitioning() == true, can isDynamicPartitioning == ture, > > i.e. why 2 methods This was just for convenience. I don't think it is required or used other than logging. Will remove it. > On April 20, 2018, 7:44 p.m., Eugene Koifman wrote: > > streaming/src/java/org/apache/hive/streaming/HiveStreamingConnection.java > > Lines 121 (patched) > > <https://reviews.apache.org/r/66645/diff/4/?file=2006488#file2006488line121> > > > > if all operations on hms client are mutexted via txnBatchLock, why have > > 2 hms clients? > > > > More generally, having 1 thread per connection may be rather heavy > > weight. Each thread will send 1 msg every few miniutes, i.e. will be > > mostly idle. DbTxnManager uses a static pool of 5 or so threads which does > > heartbeats for the whole VM (HS2). I think that would be a good model here > > as well. Only the state updates are mutexed. Actual heartbeat is outside the lock. The write path which happens more frequently is not mutexed. The transaction handling is mutexed so that heartbeat thread gets to see the correct state. Once heartbeat thread gets the first and last transaction id that is open, actual hearbeat is done via second ms client outside of the mutex. And with dynamic partitioning, we don't expect to see many streaming connections. In the earlier model, each partition was handled by a separate connection. In this new model, there can be only one connection per VM to write using streaming connection in which case I think having a static pool for heartbeat threads might not be required. > On April 20, 2018, 7:44 p.m., Eugene Koifman wrote: > > streaming/src/java/org/apache/hive/streaming/HiveStreamingConnection.java > > Lines 309 (patched) > > <https://reviews.apache.org/r/66645/diff/4/?file=2006488#file2006488line309> > > > > Shoule each streamingConnection have a UUID to include here? I can > > imagine mulitple instances writing to the same table which are not > > distiguishable by db.table Will piggyback on the agentInfo. If agentInfo is not explicitly, will create UUID. > On April 20, 2018, 7:44 p.m., Eugene Koifman wrote: > > streaming/src/java/org/apache/hive/streaming/HiveStreamingConnection.java > > Lines 326 (patched) > > <https://reviews.apache.org/r/66645/diff/4/?file=2006488#file2006488line326> > > > > this should be in try/catch. SessionState.start() can sometimes throw, > > but after it creates a Session object and ataches it to a threadLocal which > > then makes every user of that thread fail. There is an Apache bug > > describing some situations where this can happen. Will fix it. > On April 20, 2018, 7:44 p.m., Eugene Koifman wrote: > > streaming/src/java/org/apache/hive/streaming/HiveStreamingConnection.java > > Lines 337 (patched) > > <https://reviews.apache.org/r/66645/diff/4/?file=2006488#file2006488line337> > > > > should include db.table in all exceptions/message Will fix it. > On April 20, 2018, 7:44 p.m., Eugene Koifman wrote: > > streaming/src/java/org/apache/hive/streaming/HiveStreamingConnection.java > > Lines 422 (patched) > > <https://reviews.apache.org/r/66645/diff/4/?file=2006488#file2006488line422> > > > > The API in general doesn't expose 'catalog' - should it? Is catalog work already part of 3.0.0? If so I will include it. > On April 20, 2018, 7:44 p.m., Eugene Koifman wrote: > > streaming/src/java/org/apache/hive/streaming/HiveStreamingConnection.java > > Lines 538 (patched) > > <https://reviews.apache.org/r/66645/diff/4/?file=2006488#file2006488line538> > > > > why should the client know this? This is an implementation detail that > > should be hidden This was part of old interface and is used in testing. Will remove it from interface definition. > On April 20, 2018, 7:44 p.m., Eugene Koifman wrote: > > streaming/src/java/org/apache/hive/streaming/HiveStreamingConnection.java > > Lines 650 (patched) > > <https://reviews.apache.org/r/66645/diff/4/?file=2006488#file2006488line650> > > > > what if someone calls this after above close()? Seems like isClosed() > > should be a property of connection itself not current batch. > > > > The original API had a serious flaw where when something threw an > > exception, the connection was not made fully unusable and some clients > > ignored exceptions and continued to try to wrie corrupting files. I think > > the flow should be if any 'unexpected' error happens connection should > > abort any remaning txns in the batch and make all subsequent methods on it > > throw and not propagate the call further. This is not a public API. Client can only call beginNextTransaction() which already checks for isClosed(). All APIs check for isClosed state and would not let clients to write if closed. There is one more isClosed() missing. I will add it in the next patch. > On April 20, 2018, 7:44 p.m., Eugene Koifman wrote: > > streaming/src/java/org/apache/hive/streaming/TransactionBatch.java > > Line 25 (original), 25 (patched) > > <https://reviews.apache.org/r/66645/diff/4/?file=2006506#file2006506line25> > > > > Why does this need to be public at all? If you are hiding the concept > > ot TB from end user, I'd say the Connection object should have > > open/commit/abort. The 1st open() opens N txns at once (as currently), etc > > but the end user doesn't need these implementation details. In particular > > for Blob stores w/o append operation, we'd hae to make batch size = 1. > > Connection.close() would abort any remaining (unsed) txns in the > > current batch. This was also brought up from old code. I don't think this interface is of any use either. Will remove it. > On April 20, 2018, 7:44 p.m., Eugene Koifman wrote: > > streaming/src/java/org/apache/hive/streaming/TransactionBatch.java > > Line 92 (original), 93 (patched) > > <https://reviews.apache.org/r/66645/diff/4/?file=2006506#file2006506line99> > > > > why is this needed? Was there only for testing. Will remove this interface altogether as we are hiding transactions from user anyway. Not required. Will remove it. - Prasanth_J ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/66645/#review201640 ----------------------------------------------------------- On April 19, 2018, 2:36 a.m., Prasanth_J wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/66645/ > ----------------------------------------------------------- > > (Updated April 19, 2018, 2:36 a.m.) > > > Review request for hive, Ashutosh Chauhan and Eugene Koifman. > > > Bugs: HIVE-19211 > https://issues.apache.org/jira/browse/HIVE-19211 > > > Repository: hive-git > > > Description > ------- > > HIVE-19211: New streaming ingest API and support for dynamic partitioning > > > Diffs > ----- > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 73492ff > > itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java > 82ba775 > metastore/src/java/org/apache/hadoop/hive/metastore/HiveClientCache.java > PRE-CREATION > metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreUtils.java > a66c135 > serde/src/java/org/apache/hadoop/hive/serde2/JsonSerDe.java PRE-CREATION > streaming/pom.xml b58ec01 > streaming/src/java/org/apache/hive/streaming/AbstractRecordWriter.java > 25998ae > streaming/src/java/org/apache/hive/streaming/ConnectionError.java 668bffb > streaming/src/java/org/apache/hive/streaming/ConnectionInfo.java > PRE-CREATION > streaming/src/java/org/apache/hive/streaming/DelimitedInputWriter.java > 898b3f9 > streaming/src/java/org/apache/hive/streaming/HeartBeatFailure.java b1f9520 > streaming/src/java/org/apache/hive/streaming/HiveEndPoint.java b04e137 > streaming/src/java/org/apache/hive/streaming/HiveStreamingConnection.java > PRE-CREATION > streaming/src/java/org/apache/hive/streaming/ImpersonationFailed.java > 23e17e7 > streaming/src/java/org/apache/hive/streaming/InvalidColumn.java 0011b14 > streaming/src/java/org/apache/hive/streaming/InvalidPartition.java f1f9804 > streaming/src/java/org/apache/hive/streaming/InvalidTable.java ef1c91d > streaming/src/java/org/apache/hive/streaming/InvalidTransactionState.java > PRE-CREATION > streaming/src/java/org/apache/hive/streaming/InvalidTrasactionState.java > 762f5f8 > streaming/src/java/org/apache/hive/streaming/PartitionCreationFailed.java > 5f9aca6 > streaming/src/java/org/apache/hive/streaming/PartitionHandler.java > PRE-CREATION > streaming/src/java/org/apache/hive/streaming/QueryFailedException.java > ccd3ae0 > streaming/src/java/org/apache/hive/streaming/RecordWriter.java dc6d70e > streaming/src/java/org/apache/hive/streaming/SerializationError.java > a57ba00 > streaming/src/java/org/apache/hive/streaming/StreamingConnection.java > 2f760ea > streaming/src/java/org/apache/hive/streaming/StreamingException.java > a7f84c1 > streaming/src/java/org/apache/hive/streaming/StreamingIOFailure.java > 0dfbfa7 > > streaming/src/java/org/apache/hive/streaming/StrictDelimitedInputWriter.java > PRE-CREATION > streaming/src/java/org/apache/hive/streaming/StrictJsonWriter.java 0077913 > streaming/src/java/org/apache/hive/streaming/StrictRegexWriter.java c0b7324 > streaming/src/java/org/apache/hive/streaming/TransactionBatch.java 2b05771 > > streaming/src/java/org/apache/hive/streaming/TransactionBatchUnAvailable.java > a8c8cd4 > streaming/src/java/org/apache/hive/streaming/TransactionError.java a331b20 > streaming/src/test/org/apache/hive/streaming/TestDelimitedInputWriter.java > f0843a1 > streaming/src/test/org/apache/hive/streaming/TestStreaming.java 6f63bfb > > streaming/src/test/org/apache/hive/streaming/TestStreamingDynamicPartitioning.java > PRE-CREATION > > > Diff: https://reviews.apache.org/r/66645/diff/4/ > > > Testing > ------- > > > Thanks, > > Prasanth_J > >