JoaoManierii opened a new issue, #8515: URL: https://github.com/apache/incubator-devlake/issues/8515
<!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. --> ## Question We recently had an incident where someone accidentally deleted all three layers of the pull request data: Raw, Tool, and the final processed tables. To mitigate the issue, we started manually creating records based on what the ETL pipeline was failing on, we created some missing entries in the raw and tool layers and iteratively fixed missing pieces based on the ETL errors. However, we noticed that the conversion between layers does not seem to be working reliably. For example, we now have some labels appearing, but the corresponding pull requests for those labels are missing. Is there a safer and more efficient way to rebuild the layers to ensure data consistency and integrity across Raw, Tool, and domain layers? We want to avoid manual patching if possible and ensure that no orphaned or partial data is left behind. ## Screenshots N/A ## Additional context We suspect that some conversions silently fail, causing incomplete data propagation. A way to verify and recover from missing or partially converted data would be very helpful. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@devlake.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org