zachjsh opened a new issue, #16125:
URL: https://github.com/apache/druid/issues/16125

   ### Description
   
   It seems that Druid has a check that short circuits the work to mark 
segments used if it thinks that the datasource does not exist. However, if all 
of the data in the table has been soft deleted (markUnused), then Druid thinks 
the datasource doesnt exist, which consequently means that if all data in a 
datasource in soft deleted, it cannot be recovered:
   
   Here is the Druid check:
   
   
[https://github.com/implydata/druid/blame/4c252b85c30dd780cdcaa5d342abf21d9a5d52f7/[…]main/java/org/apache/druid/server/http/DataSourcesResource.java](https://github.com/implydata/druid/blame/4c252b85c30dd780cdcaa5d342abf21d9a5d52f7/server/src/main/java/org/apache/druid/server/http/DataSourcesResource.java#L283)
   
   
   It seems that this logic has been here since the inception of the markUsed 
api https://github.com/apache/druid/pull/7490.
   
   
   My thought is that the check for datasource existence was copied from the 
MarkAsUnused api logic. For markingAsUnused, I believe it does make sense to 
check for queryable datasource, because if there are any segments to mark as 
unused, then they are presumably used now, which would make the datasource 
queryable. However this same check is not appropriate when marking segments 
used, as there may be not queryable segments.
   
   ### Motivation
   
   Without fixing this, users cannot easily recover the data for a datasource 
for which all data has been previously marked unused.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to