InvisibleProgrammer commented on PR #5567:
URL: https://github.com/apache/hive/pull/5567#issuecomment-2517287336

   > > > I am not sure I get this. There is already a table-level lock for 
atomic stats update and we are adding an extra re-try loop inside?
   > > 
   > > 
   > > The existing table-level lock is an in-memory lock. That means in case 
you have multiple HMS instances and two different processes are getting 
requests in the same time, it just doesn't have any effect - as on JVM level, 
there is only one single instance.
   > 
   > ok, in-memory lock serves diff purpose - thread safety, see 
[HIVE-25904](https://issues.apache.org/jira/browse/HIVE-25904).
   > 
   > Descriptions says "updateTableColumnStatistics can throw 
SQLIntegrityConstraintViolationException during replication if HA is on and two 
different HMS instance gets the same call but with different engine. " isn't 
the `engine` part of PK? If not, maybe it should be..
   > 
   > Some of RawStore interface methods are not retriable by design, so I would 
expect others to be retriable without the need for extra hacks, simply use 
`RetryingMetaStoreClient`.
   > 
   > ```
   > public interface RawStore extends Configurable {
   >   /***
   >    * Annotation to skip retries
   >    */
   >   @Target(value = ElementType.METHOD)
   >   @Retention(value = RetentionPolicy.RUNTIME)
   >   @interface CanNotRetry {
   >   }
   > ```
   
   > Descriptions says "updateTableColumnStatistics can throw 
SQLIntegrityConstraintViolationException during replication if HA is on and two 
different HMS instance gets the same call but with different engine. " isn't 
the `engine` part of PK? If not, maybe it should be..
   
   It is trickier than this. Most of the statistics are stored in 
`TAB_COL_STATS` and it does have an engine field. But we have at least one 
statistics that is stored in `TABLE_PARAMS`. I really don't know the reason, 
but even the code says it shouldn't be in that way. See that comment from 2018: 
   ```
         // TODO: (HIVE-20109) ideally the col stats stats should be in 
colstats, not in the table!
   ```
   I assume that TODO never happened. 
   And TABLE_PARAMS is just a key-value list. For example, at the customer that 
faced with the problem, the duplicate entry is `COLUMN_STATS_ACCURATE`. I 
really don't know what is the reason why this property is in TABLE_PARAMS, not 
TAB_COL_STATS. And I also have no information about how many other fields can 
be in the same situation. 
   
   > Some of RawStore interface methods are not retriable by design, so I would 
expect others to be retriable without the need for extra hacks, simply use 
`RetryingMetaStoreClient`.
   Let me check what that exactly does and if it is possible to apply at the 
customer. 
   TBH, I'm not completely sure if retry is a proper expression about the 
current change: You can imagine, if, for example, we could cover the situation 
with a simple SQL script, it could be an upsert merge statement: insert if it 
doesn't exist. Update if it does. But DataNucleus is an ORM tool that cannot do 
such a thing. But if we force it to query the table state again, at the second 
attempt it will recognise that ups, we have records. So we should do an update 
instead of an insert. 
   Do you know better way to do an insert or update statement, using 
DataNucleus? 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to