JoaoManierii opened a new issue, #8515:
URL: https://github.com/apache/incubator-devlake/issues/8515

   <!--
   Licensed to the Apache Software Foundation (ASF) under one or more
   contributor license agreements.  See the NOTICE file distributed with
   this work for additional information regarding copyright ownership.
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at
   
       http://www.apache.org/licenses/LICENSE-2.0
   
   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
   -->
   
   ## Question
   
   We recently had an incident where someone accidentally deleted all three 
layers of the pull request data: Raw, Tool, and the final processed tables.
   
   To mitigate the issue, we started manually creating records based on what 
the ETL pipeline was failing on,  we created some missing entries in the raw 
and tool layers and iteratively fixed missing pieces based on the ETL errors.
   
   However, we noticed that the conversion between layers does not seem to be 
working reliably. For example, we now have some labels appearing, but the 
corresponding pull requests for those labels are missing.
   
   Is there a safer and more efficient way to rebuild the layers to ensure data 
consistency and integrity across Raw, Tool, and domain layers? We want to avoid 
manual patching if possible and ensure that no orphaned or partial data is left 
behind.
   
   ## Screenshots
   N/A
   
   ## Additional context
   
   We suspect that some conversions silently fail, causing incomplete data 
propagation. A way to verify and recover from missing or partially converted 
data would be very helpful.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@devlake.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to