Hi,

Yes, antlr3.g file have the same detailed definition.However, ANTLR v3 allows 
users to explicitly define the structure of the tree.

For example,

setStorageGroup
  : KW_SET KW_STORAGE KW_GROUP KW_TO prefixPath
  -> ^(TOK_SET ^(TOK_STORAGEGROUP prefixPath))
  ;

the structure of the tree is like:

            'SET'
              |
        'STORAGEGROUP'
              |    
         prefixPath

The prefixPath is another tree. Users can recursively analyse the AST node by 
function like analyze(prefixPath). Data are accessed by reference.

However, in ANTLR v4, the '->' operator is omitted.So the statement of setting 
storage group is defined as

setStorageGroup
  : KW_SET KW_STORAGE KW_GROUP KW_TO prefixPath

If we need to get the string info of prefixPath, we can use 
prefixPath.getText(), which is actually more clear and direct for developers. 
However, If 
prefixPath is not a leaf node, a StringBuilder will be created instead of 
reference access. Although operations on StringBuilder is faster than on 
String, 
creating StringBuilder too frequenly is a heavy overhead, which impairs the 
benefits and even reduce the overall performance.

Currently, I think this is what leads to the problem.

Best,
---------------------
Yuyuan KANG



> -----原始邮件-----
> 发件人: "Xiangdong Huang" <saint...@gmail.com>
> 发送时间: 2019-09-09 00:08:00 (星期一)
> 收件人: dev@iotdb.apache.org
> 抄送: 
> 主题: Re: [jira] [Created] (IOTDB-201) Query parsing runs slower when using 
> ANTLR v4
> 
> Hi,
> 
> > There are some grammar definitions that are too detailed, such as decimal
> numbers, which are categorized into many types. I think making the rules
> more general may decrease the times of calling getText() method.
> 
> One question, does the antlr3.g file have the same detailed definition,
> e.g., the decimal numbers?
> 
> Best,
> 
> -----------------------------------
> Xiangdong Huang
> School of Software, Tsinghua University
> 
>  黄向东
> 清华大学 软件学院
> 
> 
> 康愈圆 <ky...@mails.tsinghua.edu.cn> 于2019年9月5日周四 下午11:11写道:
> 
> > Hi,
> >
> > I've been working on JIRA issue [IOTDB-190 switch to ANTLR v4] these days.
> >
> > I implemented the SQL parsing module. However, it seems that the parsing
> > efficiency reduces a lot when using ANTLR v4.
> >
> > It turns out that RuleContext.getText() is frequently called, which takes
> > more than 90% of the CPU time.
> >
> > The grammer definition (.g4 file) here is a continuation of previous
> > version (ANTLR v3). There are some grammar definitions that are too
> > detailed, such as decimal numbers, which are categorized into many types. I
> > think making the rules more general may decrease the times of calling
> > getText() method.
> >
> > I plan to reconstruct the grammer definition to improve the parsing
> > efficiency.
> >
> > ----
> > Yuyuan KANG
> >
> > 在2019-09-06 13:30:00,Yuyuan KANG (Jira)<j...@apache.org>写道:
> > > Yuyuan KANG created IOTDB-201:
> > > ---------------------------------
> > >
> > >              Summary: Query parsing runs slower when using ANTLR v4
> > >                  Key: IOTDB-201
> > >                  URL: https://issues.apache.org/jira/browse/IOTDB-201
> > >              Project: Apache IoTDB
> > >           Issue Type: Improvement
> > >             Reporter: Yuyuan KANG
> > >
> > >
> > > The system now uses ANTLR v3. When transformed to ANTLR v4 using
> > previous grammar definition, experiment result shows that the efficiency of
> > logical plan generation is negatively impacted.
> > >
> > >
> > >
> > > --
> > > This message was sent by Atlassian Jira
> > > (v8.3.2#803003)
> >
> >

Reply via email to