GitHub user Nicoleee1108 added a comment to the discussion: Task ideas for the 
dkNet-AI · Apache Texera Agent Hackathon

Row-Level Data Lineage: "Why Is This Row Here?"

  Categories: Data Experience · Innovation                                      
                                                                                
                     
   
  The problem                                                                   
                                                                                
                     
Every data analyst, at some point, stares at a number in a result table and 
asks: why is this number what it is? Which input rows produced it? Was it 
inflated by a duplicate join key? Skewed by one outlier? In SQL or Pandas, this 
question is essentially unanswerable after the fact — once you've aggregated, 
the input rows are gone, and you have to re-run with manual instrumentation to 
find out.                                                                       
                                                                    
                                                                                
                                                                                
                   
  The idea

  Right-click any output row in Texera's result panel → click "Why?" → the 
canvas dims, then visually traces backwards through every operator, 
highlighting the upstream tuples that contributed to that row. Click any 
intermediate operator and the result panel shows that operator's contributing 
rows. The workflow explains itself.
                                                                                
                                                                                
                     
  A region = West, total = $1.2M row becomes a story you can audit:             
                                                                                
                     
  - 1,847 rows survived the Filter
  - Came from 1,847 Join outputs                                                
                                                                                
                     
  - Sourced from 864 customer rows and 1,847 order rows in the CSV scans      

GitHub link: 
https://github.com/apache/texera/discussions/5059#discussioncomment-16924906

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to