LucaCappelletti94 commented on code in PR #2077:
URL:
https://github.com/apache/datafusion-sqlparser-rs/pull/2077#discussion_r2560649462
##########
src/tokenizer.rs:
##########
@@ -896,14 +929,37 @@ impl<'a> Tokenizer<'a> {
line: 1,
col: 1,
};
+ let mut prev_keyword = None;
+ let mut cs_handler = CopyStdinHandler::default();
let mut location = state.location();
- while let Some(token) = self.next_token(&mut state, buf.last().map(|t|
&t.token))? {
- let span = location.span_to(state.location());
+ while let Some(token) = self.next_token(
+ &mut location,
+ &mut state,
+ buf.last().map(|t| &t.token),
+ prev_keyword,
+ false,
+ )? {
+ if let Token::Word(Word { keyword, .. }) = &token {
+ if *keyword != Keyword::NoKeyword {
+ prev_keyword = Some(*keyword);
+ }
+ }
+ let span = location.span_to(state.location());
+ cs_handler.update(&token);
buf.push(TokenWithSpan { token, span });
-
location = state.location();
+
+ if cs_handler.is_in_copy_from_stdin() {
Review Comment:
Basically, if you were to insert one or more None values (or empty strings),
they would be represented as empty strings (if I am not mistaken) and therefore
those values would be represented as `\t{empty string}\t`, and the tokenizer
would strip the subsequent tabs, making that row of the CSV not reconstructible
with the approach you described.
Suppose you are parsing:
```SQL
OPY public.actor (actor_id, first_name, last_name, last_update, value) FROM
stdin;
1 PENELOPE GUINESS 2006-02-15 09:34:33 0.11111
2 NICK 2006-02-15 09:34:33 0.22222
3 ED CHASE 2006-02-15 09:34:33 0.312323
4 JENNIFER DAVIS 2006-02-15 09:34:33 0.3232
\.
```
At the entry with ID 2, you notice that a value is left blank to represent
an optional entry, or analogously, an empty string which would still be a valid
value. The tokenizer should leave these tabs to avoid loosing that information,
as once it is stripped it cannot be reconstructed.
That being said, I personally find this syntax cursed. It is light years
from proper use of SQL lol.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]