Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Devicemap Wiki" for 
change notification.

The "Patterns2" page has been changed by rezan:
https://wiki.apache.org/devicemap/Patterns2?action=diff&rev1=1&rev2=2

   * Description
   * Publish date
  
+ The objects will also contain the attributes defined below in this 
specification.
+ 
  
  
  = Input Parsing =
@@ -56, +58 @@

  
  Each pattern file defines the domain input parsing rules:
  
-  input transformers::
+  inputTransformers::
   :: Type: list of transformation steps
   :: Optional. Default: none
   :: TODO: define what exactly these can be.
  
-  token separators::
+  tokenSeparators::
   :: Type: list of token seperator strings
   :: Optional. Default: none
  
-  ngram concatenation size::
+  ngramConcatSize::
   :: Type: greater than zero integer
   :: Optional. Default: 1
  
@@ -82, +84 @@

  pattern matching step before moving on to the next token. This algorithm is 
pipeline
  and thread safe.
  
- If the ngram concatenation size is greater than 1, the largest ngram must be
+ If the ngramConcatSize is greater than 1, the largest ngram must be
  made first before creating the smaller ngrams.
  
  
  === Example ===
  
  {{{
- Transformer: lowercase, [0-9]+ => _NUM
+ inputTransformers: lowercase, [0-9]+ => _NUM
- Token separators: [space]
+ tokenSeparators:   [space]
- ngram concatenation size: 2
+ ngramConcatSize:   2
  
- input string: A 12 xyZ
+ Input string:  A 12 xyZ
  
- Post transform: a _NUM xyz
+ Transform:     a _NUM xyz
  
- Post tokenization: a, _NUM, xyz
+ Tokenization:  a, _NUM, xyz
  
- Post ngram (token stream): a_NUM, a, _NUMxyz, _NUM, xyz
+ Ngram:         a_NUM, a, _NUMxyz, _NUM, xyz
  }}}
  
  
@@ -120, +122 @@

  
  All the pattern types are prefixed with 'Simple'. This means that each 
pattern token is matched
  using a plain UTF8 string comparison. No regex or other syntax is allowed in 
Simple patterns.
+ This allows the algorithm to use simple string hashing for matching. This 
gives maximum performance and scaling complexity equal to a hashtable 
implementation. A Simple``HashCount attribute can be optionally defined which 
hints the classifier as to how many unique hashes it would need to generate to 
support the pattern set.
- This allows the algorithm us use string hashing for matching. This gives 
maximum performance
- and scaling complexity equal to a hashtable implementation. A SimpleHashCount 
attribute can
- be defined which hints the classifier as to how many unique hashes it would 
need to generate to
- support the pattern set.
  
  Pattern attributes:
  
@@ -138, +137 @@

   RankValue::
   :: Type: integer
   :: Optional. Default: 0.
-  :: Use defined by RankType.
  
   PatternType::
   :: Type: string
@@ -159, +157 @@

  The following pattern types are defined:
  
   SimpleOrderedAnd::
-  :: Each pattern token must appear in the token stream in index order, as 
defined in the PatternTokens list. Its okay for non matched tokens to appear 
inbetween matched tokens as long as the matched tokens are still in order.
+  :: Each pattern token must appear in the token stream in index order, as 
defined in the Pattern``Tokens list. Its okay for non matched tokens to appear 
inbetween matched tokens as long as the matched tokens are still in order.
  
   SimpleAnd::
   :: Each pattern token must appear in the token stream. Order does not matter.
@@ -173, +171 @@

  The following rank types are defined:
  
   Strong::
-  :: Strong patterns are ranked higher than Weak and None. The RankValue is 
ignored and they are ranked by their position in the pattern stream. The lower 
the position, the higher the rank. When a Strong pattern is found, the pattern 
matching step can stop and this pattern can be returned without analyzing the 
rest of the stream. This is because its impossible for another pattern to rank 
higher.
+  :: Strong patterns are ranked higher than Weak and None. The Rank``Value is 
ignored and they are ranked by their position in the pattern stream. The lower 
the position, the higher the rank. When a Strong pattern is found, the pattern 
matching step can stop and this pattern can be returned without analyzing the 
rest of the stream. This is because its impossible for another pattern to rank 
higher.
  
   Weak::
-  :: Weak patterns are ranked below Strong but above None. A Weak pattern can 
only be returned in the absence of a Strong pattern. Weak patterns always rank 
higher than None patterns, regardless of the RankValue. The RankValue is used 
to rank between successfully matched Weak patterns.
+  :: Weak patterns are ranked below Strong but above None. A Weak pattern can 
only be returned in the absence of a Strong pattern. Weak patterns always rank 
higher than None patterns, regardless of their Rank``Value. The Rank``Value is 
used to rank between successfully matched Weak patterns.
  
   None::
-  :: None patterns are ranked below Strong and Weak. A None pattern can only 
be returned in the absence of successful Strong and Weak patterns. The 
RankValue is used to rank between successfully matched None patterns.
+  :: None patterns are ranked below Strong and Weak. A None pattern can only 
be returned in the absence of successful Strong and Weak patterns. The 
Rank``Value is used to rank between successfully matched None patterns.
  
- In the case where 2 or more Weak or None patterns have the same RankValue 
resulting in a tie,
+ In the case where 2 or more Weak or None patterns have the same Rank``Value 
resulting in a tie,
  the pattern with the longest concatenated matched pattern length is used. If 
that results in
  another tie, the pattern found first is returned.
  
@@ -190, +188 @@

  
  === Notes ===
  
- If 2 or more patterns share the same PatternId, then only 1 of their 
PatternTypes
+ If 2 or more patterns share the same Pattern``Id, then only 1 of their 
Pattern``Types
- need to match. There is an implied OR between multiple PatternTypes with 
equal PatternId.
+ need to match. There is an implied OR between multiple Pattern``Types with 
equal Pattern``Id.
  
  If more than 1 default is defined, the 1st one found in the Pattern file is 
used.
  
- 2 or more patterns cannot have identical RankType, RankValue, and matched 
tokens. Since they will be
+ 2 or more patterns cannot have identical Rank``Type, Rank``Value, and matched 
tokens. Since they will be
  found at the same time, the pattern the classifier chooses is undefined.
  
  
@@ -236, +234 @@

  
  = Attribute Retrieval =
  
- This step processes the result of the Pattern Matching step. The PatternId is 
used
+ This step processes the result of the Pattern Matching step. The Pattern``Id 
is used
- to look up the corresponding attribute map. The patternId and the attribute 
map
+ to look up the corresponding attribute map. The Pattern``Id and the attribute 
map
  are returned.
  
  
@@ -254, +252 @@

  
  The attribute map must be immutable.
  
- If a null pattern is returned from the previous step, this must be safely 
signaled back.
+ If a null pattern is returned from the previous step, this must be safely 
returned.
  TODO: how?
  

Reply via email to