This is an automated email from the ASF dual-hosted git repository.
jin pushed a change to branch text2gql
in repository https://gitbox.apache.org/repos/asf/incubator-hugegraph-ai.git
from 28f85757 feat(llm): support vector db layer V1.0 (#304)
new 643a3bb8 feat: add configuration management module with dictionary
paths and generation parameters
new cfdfea12 feat: add Gremlin parsing base classes with Step, Traversal
core data structures
new 220d84e2 feat: add Gremlin expression processing module with
predicates and connectors support
new 84f7a417 feat: add graph database schema management with vertex/edge
labels and properties
new 8e6df488 feat: add Gremlin base component library with synonym
replacement and data instances
new d901c1ab feat: add ANTLR syntax tree visitor with Gremlin query to
Recipe parsing and call/with support
new b69ad852 feat: add recursive backtracking traversal generator for
diverse query variants from Recipe
new 56d88de3 feat: add main corpus generator with batch processing, global
deduplication and error handling
new 36f6469a config: add global configuration file with generation
parameters and path settings
new eee134b9 data: add cypher2gremlin dataset with 3514 real query
templates
new 5aa54f93 docs: add project README with quick start guide and usage
instructions
new 4b9fb27a feat: add ANTLR-generated Gremlin grammar package with lexer,
parser and visitor classes
new ddaae793 data: add schema and graph data
new b1399602 feat: add template directory with schema dictionary and
synonym files
new cc18bfd2 test: add gremlin statement generalization generation test
module
new d1cc6fce test: add generator unit tests for corpus generation
validation
new 32a8eda3 Add graph2gremlin.py: Initial template-based Gremlin data
generation with correctness guarantee and preliminary question generalization
new 06b95712 Add gremlin_checker.py: Syntax checking using Antlr4
new 8a5f5dc0 Add llm_handler.py: LLM interaction model for query
generalization and translation
new ec1bf700 Add qa_generalize.py: Seed data generalization using
gremlin_checker and llm_handler
new 41ae008c Add instruct_convert.py: Instruction format conversion and
train/test set division
new 8610d010 Add da_data: Schema and graph data
new ba36dd6c Add data/seed_data: Seed data directory
new b9762b6d Add data/vertical_training_sets: Vertical domain scenario
generalized data directory
new 21109b8d Add books on Gremlin syntax knowledge to process data.
new 936db1eb Add a dataset of Gremlin QA pairs synthesized based on LLM.
new c7db577a Add README.md
new 999fea15 Compatible with OpenAI format
new ece915b4 Increase Gremlin syntax vocabulary that supports
generalization, and add data control policies.
new 3677b521 modify README.md
new 958fef30 Add Apache-2.0 license, fix review comments
new f84276ba Modify the .licenserc.yaml file to ignore license checks for
.interp, .tokens, and .csv files.
The 32 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails. The revisions
listed as "add" were already present in the repository and have only
been added to this reference.
Summary of changes:
.licenserc.yaml | 3 +
text2gremlin/AST_Text2Gremlin/README.md | 183 +
.../AST_Text2Gremlin/base/CombinationController.py | 370 +
text2gremlin/AST_Text2Gremlin/base/Config.py | 92 +
text2gremlin/AST_Text2Gremlin/base/GremlinBase.py | 370 +
text2gremlin/AST_Text2Gremlin/base/GremlinExpr.py | 120 +
text2gremlin/AST_Text2Gremlin/base/GremlinParse.py | 95 +
.../AST_Text2Gremlin/base/GremlinTransVisitor.py | 2383 ++
text2gremlin/AST_Text2Gremlin/base/Schema.py | 215 +
.../AST_Text2Gremlin/base/TraversalGenerator.py | 3224 ++
.../AST_Text2Gremlin/base/__init__.py | 12 +-
.../base/combination_control_config.json | 279 +
text2gremlin/AST_Text2Gremlin/base/generator.py | 445 +
.../AST_Text2Gremlin/base/gremlin/Gremlin.g4 | 3094 ++
.../AST_Text2Gremlin/base/gremlin/Gremlin.interp | 855 +
.../AST_Text2Gremlin/base/gremlin/Gremlin.tokens | 532 +
.../base/gremlin/GremlinLexer.interp | 865 +
.../AST_Text2Gremlin/base/gremlin/GremlinLexer.py | 1616 +
.../base/gremlin/GremlinLexer.tokens | 532 +
.../base/gremlin/GremlinListener.py | 3765 +++
.../AST_Text2Gremlin/base/gremlin/GremlinParser.py | 32757 +++++++++++++++++++
.../base/gremlin/GremlinVisitor.py | 2106 ++
.../AST_Text2Gremlin/base/gremlin}/__init__.py | 0
.../AST_Text2Gremlin/base/template/schema_dict.txt | 86 +
.../AST_Text2Gremlin/base/template/syn_dict.txt | 12 +
text2gremlin/AST_Text2Gremlin/config.json | 15 +
.../db_data/movie/raw_data/edge_acted_in.csv | 667 +
.../db_data/movie/raw_data/edge_directed.csv | 67 +
.../db_data/movie/raw_data/edge_has_genre.csv | 152 +
.../db_data/movie/raw_data/edge_has_keyword.csv | 5120 +++
.../db_data/movie/raw_data/edge_is_friend.csv | 701 +
.../db_data/movie/raw_data/edge_produce.csv | 62 +
.../db_data/movie/raw_data/edge_rate.csv | 297 +
.../db_data/movie/raw_data/edge_write.csv | 109 +
.../db_data/movie/raw_data/vertex_genre.csv | 21 +
.../db_data/movie/raw_data/vertex_keyword.csv | 3255 ++
.../db_data/movie/raw_data/vertex_movie.csv | 56 +
.../db_data/movie/raw_data/vertex_person.csv | 667 +
.../db_data/movie/raw_data/vertex_user.csv | 236 +
.../db_data/schema/movie_schema.json | 316 +
text2gremlin/AST_Text2Gremlin/generate_corpus.py | 209 +
.../AST_Text2Gremlin/gremlin_templates.csv | 199 +
.../output/generated_corpus_20251029_190729.json | 5983 ++++
text2gremlin/AST_Text2Gremlin/requirements.txt | 44 +
44 files changed, 72181 insertions(+), 6 deletions(-)
create mode 100644 text2gremlin/AST_Text2Gremlin/README.md
create mode 100644 text2gremlin/AST_Text2Gremlin/base/CombinationController.py
create mode 100644 text2gremlin/AST_Text2Gremlin/base/Config.py
create mode 100644 text2gremlin/AST_Text2Gremlin/base/GremlinBase.py
create mode 100644 text2gremlin/AST_Text2Gremlin/base/GremlinExpr.py
create mode 100644 text2gremlin/AST_Text2Gremlin/base/GremlinParse.py
create mode 100644 text2gremlin/AST_Text2Gremlin/base/GremlinTransVisitor.py
create mode 100644 text2gremlin/AST_Text2Gremlin/base/Schema.py
create mode 100644 text2gremlin/AST_Text2Gremlin/base/TraversalGenerator.py
copy hugegraph-llm/src/hugegraph_llm/enums/property_cardinality.py =>
text2gremlin/AST_Text2Gremlin/base/__init__.py (80%)
create mode 100644
text2gremlin/AST_Text2Gremlin/base/combination_control_config.json
create mode 100644 text2gremlin/AST_Text2Gremlin/base/generator.py
create mode 100644 text2gremlin/AST_Text2Gremlin/base/gremlin/Gremlin.g4
create mode 100644 text2gremlin/AST_Text2Gremlin/base/gremlin/Gremlin.interp
create mode 100644 text2gremlin/AST_Text2Gremlin/base/gremlin/Gremlin.tokens
create mode 100644
text2gremlin/AST_Text2Gremlin/base/gremlin/GremlinLexer.interp
create mode 100644 text2gremlin/AST_Text2Gremlin/base/gremlin/GremlinLexer.py
create mode 100644
text2gremlin/AST_Text2Gremlin/base/gremlin/GremlinLexer.tokens
create mode 100644
text2gremlin/AST_Text2Gremlin/base/gremlin/GremlinListener.py
create mode 100644 text2gremlin/AST_Text2Gremlin/base/gremlin/GremlinParser.py
create mode 100644 text2gremlin/AST_Text2Gremlin/base/gremlin/GremlinVisitor.py
copy {hugegraph-llm/src/hugegraph_llm =>
text2gremlin/AST_Text2Gremlin/base/gremlin}/__init__.py (100%)
create mode 100644 text2gremlin/AST_Text2Gremlin/base/template/schema_dict.txt
create mode 100644 text2gremlin/AST_Text2Gremlin/base/template/syn_dict.txt
create mode 100644 text2gremlin/AST_Text2Gremlin/config.json
create mode 100644
text2gremlin/AST_Text2Gremlin/db_data/movie/raw_data/edge_acted_in.csv
create mode 100644
text2gremlin/AST_Text2Gremlin/db_data/movie/raw_data/edge_directed.csv
create mode 100644
text2gremlin/AST_Text2Gremlin/db_data/movie/raw_data/edge_has_genre.csv
create mode 100644
text2gremlin/AST_Text2Gremlin/db_data/movie/raw_data/edge_has_keyword.csv
create mode 100644
text2gremlin/AST_Text2Gremlin/db_data/movie/raw_data/edge_is_friend.csv
create mode 100644
text2gremlin/AST_Text2Gremlin/db_data/movie/raw_data/edge_produce.csv
create mode 100644
text2gremlin/AST_Text2Gremlin/db_data/movie/raw_data/edge_rate.csv
create mode 100644
text2gremlin/AST_Text2Gremlin/db_data/movie/raw_data/edge_write.csv
create mode 100644
text2gremlin/AST_Text2Gremlin/db_data/movie/raw_data/vertex_genre.csv
create mode 100644
text2gremlin/AST_Text2Gremlin/db_data/movie/raw_data/vertex_keyword.csv
create mode 100644
text2gremlin/AST_Text2Gremlin/db_data/movie/raw_data/vertex_movie.csv
create mode 100644
text2gremlin/AST_Text2Gremlin/db_data/movie/raw_data/vertex_person.csv
create mode 100644
text2gremlin/AST_Text2Gremlin/db_data/movie/raw_data/vertex_user.csv
create mode 100644
text2gremlin/AST_Text2Gremlin/db_data/schema/movie_schema.json
create mode 100644 text2gremlin/AST_Text2Gremlin/generate_corpus.py
create mode 100644 text2gremlin/AST_Text2Gremlin/gremlin_templates.csv
create mode 100644
text2gremlin/AST_Text2Gremlin/output/generated_corpus_20251029_190729.json
create mode 100644 text2gremlin/AST_Text2Gremlin/requirements.txt