kunwp1 opened a new issue, #4588:
URL: https://github.com/apache/texera/issues/4588

   Dataset (Too large to upload in github):
   https://texera.eye.som.uci.edu/dashboard/hub/dataset/result/detail/6
   
   Model: Claude-Haiku-4-5
   
   Issue: The agent couldn't create a workflow that reads the dataset using the 
following prompt and ends up getting a `litellm.RateLimitError: 
AnthropicException`.
   
   ```
   # Dataset 
   
   1. TexeraChatbot_testdata_DDX41.txt.gz
   
   This TexeraChatbot_testdata_DDX41.txt.gz file includes a cell-by-gene raw 
count matrix, comprising 15,307 single cells (in columns) and 33,696 features 
(gene symbols, in rows). The first row contains cell barcodes, and the first 
column contains gene symbols.
   
   2. TexeraChatbot_testdata_DDX41_obs[.txt.gz](http://.txt.gz/)
   
   This TexeraChatbot_testdata_DDX41_obs.txt.gz file includes cell-level 
metadata for cell barcodes. The column “barcode” is the unique identifier for 
each cell. Other columns are described below:
   
   - nCount_RNA: total UMI counts per cell
   - nFeature_RNA: total number of detected features per cell
   - percent.mt: percentage of mitochondrial reads per cell
   - pANN: proportion of artificial nearest neighbors calculated by 
DoubletFinder
   - nuclear_fraction: nuclear fraction score, capturing the proportion of 
reads derived from intronic regions; calculated using the DropletQC R package
   - sampleid: 2 unique sample IDs, i.e., DDX41 for DDX41 cKO mouse and WT for 
wild-type mouse. The genotype for the conditional knockout mouse is Ddx41 
fl/fl; ChxCre, and the genotype for the wild-type mouse is Ddx41fl/fl.
   - majorclass: 12 annotated major cell classes, including AC, BC, Cone, HC, 
MG, Microglia, RGC, Rod, Endothelial, Pericyte, RPE, and Astrocyte
   - celltype: high-resolution cell type annotation
   
   In summary, the dataset comprises 15,307 single cells derived from 2 unique 
sample IDs, annotated into 12 major cell classes.
   
   3. TexeraChatbot_testdata_DDX41_var.txt.gz
   
   This TexeraChatbot_testdata_DDX41_var.txt.gz file includes the gene features 
for the single-cell dataset. The “symbol” column contains the gene symbols for 
the 33,696 features, including both protein-coding and non-coding genes. Gene 
identifiers are gene symbols, and the RNA genome build used is the mouse 
reference (GRCm39).
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to