Re: [PR] Remove Whitespace Tokens from Parser [datafusion-sqlparser-rs]

via GitHub Fri, 21 Nov 2025 23:02:44 -0800


iffyio commented on code in PR #2077:
URL: 
https://github.com/apache/datafusion-sqlparser-rs/pull/2077#discussion_r2552392047



##########
src/tokenizer.rs:
##########
@@ -896,14 +929,37 @@ impl<'a> Tokenizer<'a> {
             line: 1,
             col: 1,
         };
+        let mut prev_keyword = None;
+        let mut cs_handler = CopyStdinHandler::default();
 
         let mut location = state.location();
-        while let Some(token) = self.next_token(&mut state, buf.last().map(|t| 
&t.token))? {
-            let span = location.span_to(state.location());
+        while let Some(token) = self.next_token(
+            &mut location,
+            &mut state,
+            buf.last().map(|t| &t.token),
+            prev_keyword,
+            false,
+        )? {
+            if let Token::Word(Word { keyword, .. }) = &token {
+                if *keyword != Keyword::NoKeyword {
+                    prev_keyword = Some(*keyword);
+                }
+            }
 
+            let span = location.span_to(state.location());
+            cs_handler.update(&token);
             buf.push(TokenWithSpan { token, span });
-
             location = state.location();
+
+            if cs_handler.is_in_copy_from_stdin() {

Review Comment:
   Yeah so the thinking was that the tokenizer is as usual agnostic of the 
context.
   
   So using this as an example input
   
   ```sql
   COPY actor (actor_id) FROM STDIN;
   1    PENELOPE
   2    NICK
   \.
   ```
   
   The tokenizer produces the token stream
   
   ```rust
   [COPY, actor, (, actor_id, ), FROM, STDIN, ;, 1, PENELOPE, 2, NICK, \, .]
   ```
   
   Then the parser, after parsing the `STDIN;`, knows that the delimeter is 
whitespace implicitly. Then it consumes the tokens `[1, PENELOPE ...]` 
interspersed with whitespace in-between tokens, to build the actual csv string.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Remove Whitespace Tokens from Parser [datafusion-sqlparser-rs]

Reply via email to