Request for NiFi Dataset Availability for LLM Training

Ray Valle Mon, 21 Jul 2025 09:43:33 -0700

Dear Apache NiFi Team,

I hope this message finds you well.


My name is Ray Valle, and I am currently leading a project that integrates 
Apache NiFi into a broader AI-driven infrastructure. As part of our 
development, we are training a domain-specific large language model (LLM) to 
understand, validate, and enhance NiFi dataflows and templates.

To support this initiative, I am reaching out to kindly inquire whether the 
Apache NiFi community or foundation maintains or can recommend any publicly 
available datasets—such as sample templates, configuration files, processor 
metadata, flow logs, or documentation corpora—that could be utilized for 
training or fine-tuning an LLM for NiFi-specific tasks.

If such datasets exist or if there are any licensing considerations or 
contribution guidelines I should be aware of, I would greatly appreciate your 
direction.

Thank you in advance for your time and support. I am enthusiastic about 
contributing to the NiFi ecosystem through AI-enhanced tooling and would 
welcome any opportunity to align with community efforts.

Warm regards,
Ray Valle
Founder | PremierBooks AI Initiative
Email: [email protected]


Disclaimer: This message is confidential, intended only for the named 
recipient(s), and may contain information that is privileged or exempt from 
disclosure under applicable law. If you are not the intended recipient(s) of 
this message, you are notified that the dissemination, distribution, or copying 
of this message is strictly prohibited. If you receive this message in error or 
are not the named recipient(s), please notify the sender by return email and 
delete this message. Thank you.

Request for NiFi Dataset Availability for LLM Training

Reply via email to