Hi,
Yes, antlr3.g file have the same detailed definition.However, ANTLR v3 allows
users to explicitly define the structure of the tree.
For example,
setStorageGroup
: KW_SET KW_STORAGE KW_GROUP KW_TO prefixPath
-> ^(TOK_SET ^(TOK_STORAGEGROUP prefixPath))
;
the structure of the tree is like:
'SET'
|
'STORAGEGROUP'
|
prefixPath
The prefixPath is another tree. Users can recursively analyse the AST node by
function like analyze(prefixPath). Data are accessed by reference.
However, in ANTLR v4, the '->' operator is omitted.So the statement of setting
storage group is defined as
setStorageGroup
: KW_SET KW_STORAGE KW_GROUP KW_TO prefixPath
If we need to get the string info of prefixPath, we can use
prefixPath.getText(), which is actually more clear and direct for developers.
However, If
prefixPath is not a leaf node, a StringBuilder will be created instead of
reference access. Although operations on StringBuilder is faster than on
String,
creating StringBuilder too frequenly is a heavy overhead, which impairs the
benefits and even reduce the overall performance.
Currently, I think this is what leads to the problem.
Best,
---------------------
Yuyuan KANG
> -----原始邮件-----
> 发件人: "Xiangdong Huang" <[email protected]>
> 发送时间: 2019-09-09 00:08:00 (星期一)
> 收件人: [email protected]
> 抄送:
> 主题: Re: [jira] [Created] (IOTDB-201) Query parsing runs slower when using
> ANTLR v4
>
> Hi,
>
> > There are some grammar definitions that are too detailed, such as decimal
> numbers, which are categorized into many types. I think making the rules
> more general may decrease the times of calling getText() method.
>
> One question, does the antlr3.g file have the same detailed definition,
> e.g., the decimal numbers?
>
> Best,
>
> -----------------------------------
> Xiangdong Huang
> School of Software, Tsinghua University
>
> 黄向东
> 清华大学 软件学院
>
>
> 康愈圆 <[email protected]> 于2019年9月5日周四 下午11:11写道:
>
> > Hi,
> >
> > I've been working on JIRA issue [IOTDB-190 switch to ANTLR v4] these days.
> >
> > I implemented the SQL parsing module. However, it seems that the parsing
> > efficiency reduces a lot when using ANTLR v4.
> >
> > It turns out that RuleContext.getText() is frequently called, which takes
> > more than 90% of the CPU time.
> >
> > The grammer definition (.g4 file) here is a continuation of previous
> > version (ANTLR v3). There are some grammar definitions that are too
> > detailed, such as decimal numbers, which are categorized into many types. I
> > think making the rules more general may decrease the times of calling
> > getText() method.
> >
> > I plan to reconstruct the grammer definition to improve the parsing
> > efficiency.
> >
> > ----
> > Yuyuan KANG
> >
> > 在2019-09-06 13:30:00,Yuyuan KANG (Jira)<[email protected]>写道:
> > > Yuyuan KANG created IOTDB-201:
> > > ---------------------------------
> > >
> > > Summary: Query parsing runs slower when using ANTLR v4
> > > Key: IOTDB-201
> > > URL: https://issues.apache.org/jira/browse/IOTDB-201
> > > Project: Apache IoTDB
> > > Issue Type: Improvement
> > > Reporter: Yuyuan KANG
> > >
> > >
> > > The system now uses ANTLR v3. When transformed to ANTLR v4 using
> > previous grammar definition, experiment result shows that the efficiency of
> > logical plan generation is negatively impacted.
> > >
> > >
> > >
> > > --
> > > This message was sent by Atlassian Jira
> > > (v8.3.2#803003)
> >
> >