[
https://issues.apache.org/jira/browse/NIFI-13433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew M. Lim updated NIFI-13433:
---------------------------------
Description:
I have a ChunkDocument processor connected to PutChroma. The FlowFile contents
from Chunk Document is as follows:
{
"text" : "Title 1",
"metadata" : {
"languages" : [ "Eng" ],
"page_number" : 1,
"filetype" : "application/pdf",
"category" : "Title",
"filename" : "pdf/PDF-test.pdf",
"uuid" : "eeb8441d-0e4b-4d24-bd93-78e8f352306b",
"chunk_index" : 0,
"chunk_count" : 3,
"parent_id" : null
}
}
Running PutChroma gives the following error, because Chroma expects a str, not
a list for languages:
PythonProcessor[type=PutChroma, id=b65634ca-0211-30e4-0c1b-49776e379b4b] Failed
to transform FlowFile[filename=pdf/PDF-test.pdf]: py4j.Py4JException: An
exception was raised by the Python Proxy. Return Message: Traceback (most
recent call last):
File
"/Users/andrew.lim/Downloads/nifi-2.0.0-M3/python/framework/py4j/java_gateway.py",
line 2466, in _call_proxy
return_value = getattr(self.pool[obj_id], method)(*params)
File
"/Users/andrew.lim/Downloads/nifi-2.0.0-M3/python/api/nifiapi/flowfiletransform.py",
line 33, in transformFlowFile
return self.transform(self.process_context, flowfile)
File
"/Users/andrew.lim/Downloads/nifi-2.0.0-M3/python/extensions/vectorstores/PutChroma.py",
line 123, in transform
collection.upsert(ids, embeddings, metadatas, texts)
File
"/Users/andrew.lim/Downloads/nifi-2.0.0-M3/./work/python/extensions/PutChroma/2.0.0-M3/chromadb/api/models/Collection.py",
line 477, in upsert
) = self._validate_embedding_set(
File
"/Users/andrew.lim/Downloads/nifi-2.0.0-M3/./work/python/extensions/PutChroma/2.0.0-M3/chromadb/api/models/Collection.py",
line 554, in _validate_embedding_set
validate_metadatas(maybe_cast_one_to_many_metadata(metadatas))
File
"/Users/andrew.lim/Downloads/nifi-2.0.0-M3/./work/python/extensions/PutChroma/2.0.0-M3/chromadb/api/types.py",
line 291, in validate_metadatas
validate_metadata(metadata)
File
"/Users/andrew.lim/Downloads/nifi-2.0.0-M3/./work/python/extensions/PutChroma/2.0.0-M3/chromadb/api/types.py",
line 259, in validate_metadata
raise ValueError(
ValueError: Expected metadata value to be a str, int, float or bool, got
['Eng'] which is a <class 'list'>
was:
I have a ChunkDocument processor connected to PutChroma. The FlowFile contents
from Chunk Document is as follows:
{{{
"text" : "Title 1",
"metadata" : {
"languages" : [ "Eng" ],
"page_number" : 1,
"filetype" : "application/pdf",
"category" : "Title",
"filename" : "pdf/PDF-test.pdf",
"uuid" : "eeb8441d-0e4b-4d24-bd93-78e8f352306b",
"chunk_index" : 0,
"chunk_count" : 3,
"parent_id" : null
}
}
}}
Running PutChroma gives the following error, because Chroma expects a str, not
a list for languages:
{{PythonProcessor[type=PutChroma, id=b65634ca-0211-30e4-0c1b-49776e379b4b]
Failed to transform FlowFile[filename=pdf/PDF-test.pdf]: py4j.Py4JException: An
exception was raised by the Python Proxy. Return Message: Traceback (most
recent call last):
File
"/Users/andrew.lim/Downloads/nifi-2.0.0-M3/python/framework/py4j/java_gateway.py",
line 2466, in _call_proxy
return_value = getattr(self.pool[obj_id], method)(*params)
File
"/Users/andrew.lim/Downloads/nifi-2.0.0-M3/python/api/nifiapi/flowfiletransform.py",
line 33, in transformFlowFile
return self.transform(self.process_context, flowfile)
File
"/Users/andrew.lim/Downloads/nifi-2.0.0-M3/python/extensions/vectorstores/PutChroma.py",
line 123, in transform
collection.upsert(ids, embeddings, metadatas, texts)
File
"/Users/andrew.lim/Downloads/nifi-2.0.0-M3/./work/python/extensions/PutChroma/2.0.0-M3/chromadb/api/models/Collection.py",
line 477, in upsert
) = self._validate_embedding_set(
File
"/Users/andrew.lim/Downloads/nifi-2.0.0-M3/./work/python/extensions/PutChroma/2.0.0-M3/chromadb/api/models/Collection.py",
line 554, in _validate_embedding_set
validate_metadatas(maybe_cast_one_to_many_metadata(metadatas))
File
"/Users/andrew.lim/Downloads/nifi-2.0.0-M3/./work/python/extensions/PutChroma/2.0.0-M3/chromadb/api/types.py",
line 291, in validate_metadatas
validate_metadata(metadata)
File
"/Users/andrew.lim/Downloads/nifi-2.0.0-M3/./work/python/extensions/PutChroma/2.0.0-M3/chromadb/api/types.py",
line 259, in validate_metadata
raise ValueError(
ValueError: Expected metadata value to be a str, int, float or bool, got
['Eng'] which is a <class 'list'>}}
> PutChroma processor has errors when "languages" is a list
> ---------------------------------------------------------
>
> Key: NIFI-13433
> URL: https://issues.apache.org/jira/browse/NIFI-13433
> Project: Apache NiFi
> Issue Type: Bug
> Reporter: Andrew M. Lim
> Priority: Major
>
> I have a ChunkDocument processor connected to PutChroma. The FlowFile
> contents from Chunk Document is as follows:
> {
> "text" : "Title 1",
> "metadata" : {
> "languages" : [ "Eng" ],
> "page_number" : 1,
> "filetype" : "application/pdf",
> "category" : "Title",
> "filename" : "pdf/PDF-test.pdf",
> "uuid" : "eeb8441d-0e4b-4d24-bd93-78e8f352306b",
> "chunk_index" : 0,
> "chunk_count" : 3,
> "parent_id" : null
> }
> }
> Running PutChroma gives the following error, because Chroma expects a str,
> not a list for languages:
> PythonProcessor[type=PutChroma, id=b65634ca-0211-30e4-0c1b-49776e379b4b]
> Failed to transform FlowFile[filename=pdf/PDF-test.pdf]: py4j.Py4JException:
> An exception was raised by the Python Proxy. Return Message: Traceback (most
> recent call last):
> File
> "/Users/andrew.lim/Downloads/nifi-2.0.0-M3/python/framework/py4j/java_gateway.py",
> line 2466, in _call_proxy
> return_value = getattr(self.pool[obj_id], method)(*params)
> File
> "/Users/andrew.lim/Downloads/nifi-2.0.0-M3/python/api/nifiapi/flowfiletransform.py",
> line 33, in transformFlowFile
> return self.transform(self.process_context, flowfile)
> File
> "/Users/andrew.lim/Downloads/nifi-2.0.0-M3/python/extensions/vectorstores/PutChroma.py",
> line 123, in transform
> collection.upsert(ids, embeddings, metadatas, texts)
> File
> "/Users/andrew.lim/Downloads/nifi-2.0.0-M3/./work/python/extensions/PutChroma/2.0.0-M3/chromadb/api/models/Collection.py",
> line 477, in upsert
> ) = self._validate_embedding_set(
> File
> "/Users/andrew.lim/Downloads/nifi-2.0.0-M3/./work/python/extensions/PutChroma/2.0.0-M3/chromadb/api/models/Collection.py",
> line 554, in _validate_embedding_set
> validate_metadatas(maybe_cast_one_to_many_metadata(metadatas))
> File
> "/Users/andrew.lim/Downloads/nifi-2.0.0-M3/./work/python/extensions/PutChroma/2.0.0-M3/chromadb/api/types.py",
> line 291, in validate_metadatas
> validate_metadata(metadata)
> File
> "/Users/andrew.lim/Downloads/nifi-2.0.0-M3/./work/python/extensions/PutChroma/2.0.0-M3/chromadb/api/types.py",
> line 259, in validate_metadata
> raise ValueError(
> ValueError: Expected metadata value to be a str, int, float or bool, got
> ['Eng'] which is a <class 'list'>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)