ryanrussell opened a new issue, #8120:
URL: https://github.com/apache/arrow-rs/issues/8120

   # ARROW-012-015: Remaining Arrow 56.0 Breaking Changes
   
   ## Problem
   Several smaller but critical breaking changes in Arrow 56.0 need linter 
coverage for complete migration support.
   
   ## API Changes Details
   
   ### ARROW-012: Int96::to_i64() Removal (Parquet)
   ```rust
   -pub fn parquet::data_type::Int96::to_i64(&self) -> i64
   ```
   
   ### ARROW-013: Parquet Level Enum Removal
   ```rust
   -pub enum parquet::column::writer::Level
   -pub parquet::column::writer::Level::Column  
   -pub parquet::column::writer::Level::Page
   ```
   
   ### ARROW-014: Field Dictionary Behavior Changes
   **Behavioral breaking change** in commit ed0213143:
   - `Field::try_merge()` no longer checks `dict_id` equality
   - `Field::contains()` no longer compares `dict_id` fields
   This can cause subtle bugs in schema merging logic.
   
   ### ARROW-015: New Decimal Types (Enhancement Detection)
   ```rust
   +pub arrow_schema::DataType::Decimal32(u8, i8)
   +pub arrow_schema::DataType::Decimal64(u8, i8)
   ```
   
   ## Implementation Task
   Create `src/rules/arrow_012_015_misc_breaking_changes.rs` with these 
specifications:
   
   ### Rule Implementation
   ```rust
   use regex::Regex;
   use std::path::Path;
   use crate::output::{Issue, Severity};
   use crate::rules::Rule;
   
   /// ARROW-012-015: Detect miscellaneous breaking changes in Arrow 56.0
   /// 
   /// Covers Int96::to_i64 removal, Level enum removal, Field behavior changes,
   /// and opportunities to use new Decimal32/64 types.
   pub struct Arrow012015Rule;
   
   impl Rule for Arrow012015Rule {
       fn rule_id(&self) -> &'static str {
           "ARROW-012-015"
       }
   
       fn check_rust_source(&self, file_path: &Path, content: &str) -> 
Result<Vec<Issue>, Box<dyn std::error::Error>> {
           let mut issues = Vec::new();
           
           for (line_num, line) in content.lines().enumerate() {
               // ARROW-012: Int96::to_i64() removal
               if line.contains(".to_i64()") && (line.contains("Int96") || 
line.contains("int96")) {
                   if let Some(mat) = 
Regex::new(r"\.to_i64\s*\(\s*\)")?.find(line) {
                       issues.push(Issue {
                           rule_id: "ARROW-012".to_string(),
                           severity: Severity::Error,
                           message: "Int96::to_i64() method was removed in 
Arrow 56.0".to_string(),
                           file_path: file_path.to_path_buf(),
                           line: line_num + 1,
                           column: mat.start() + 1,
                           suggestion: Some("Use manual conversion: 
(int96.data[0] as i64) | ((int96.data[1] as i64) << 32). Note: This loses 
precision if the value exceeds i64 range.".to_string()),
                           auto_fixable: false,
                           deprecated_since: Some("56.0.0".to_string()),
                           changelog_url: 
Some("https://github.com/apache/arrow-rs/blob/main/CHANGELOG.md#560".to_string()),
                           migration_guide_url: 
Some("https://arrow.apache.org/docs/rust/migration_guide.html#int96-to-i64".to_string()),
                       });
                   }
               }
               
               // ARROW-013: Level enum usage
               if let Some(mat) = 
Regex::new(r"Level::(Column|Page)\b")?.find(line) {
                   issues.push(Issue {
                       rule_id: "ARROW-013".to_string(),
                       severity: Severity::Error,
                       message: "parquet::column::writer::Level enum was 
removed in Arrow 56.0".to_string(),
                       file_path: file_path.to_path_buf(),
                       line: line_num + 1,
                       column: mat.start() + 1,
                       suggestion: Some("The Level enum and its variants 
(Column, Page) were removed. Use the appropriate writer configuration methods 
directly.".to_string()),
                       auto_fixable: false,
                       deprecated_since: Some("56.0.0".to_string()),
                       changelog_url: 
Some("https://github.com/apache/arrow-rs/blob/main/CHANGELOG.md#560".to_string()),
                       migration_guide_url: 
Some("https://arrow.apache.org/docs/rust/migration_guide.html#level-enum-removal".to_string()),
                   });
               }
               
               // ARROW-014: Field merge/contains behavior change warnings
               if line.contains("try_merge") || line.contains("contains") {
                   if line.contains("Field") || line.contains("field") {
                       if let Some(mat) = 
Regex::new(r"\.(try_merge|contains)\s*\(")?.find(line) {
                           issues.push(Issue {
                               rule_id: "ARROW-014".to_string(),
                               severity: Severity::Warning,
                               message: "Field::try_merge() and 
Field::contains() behavior changed in Arrow 56.0".to_string(),
                               file_path: file_path.to_path_buf(),
                               line: line_num + 1,
                               column: mat.start() + 1,
                               suggestion: Some("These methods no longer check 
dict_id equality. If you depend on dict_id comparison, add explicit checks: 
field1.dict_id == field2.dict_id".to_string()),
                               auto_fixable: false,
                               deprecated_since: Some("56.0.0 (behavior 
change)".to_string()),
                               changelog_url: 
Some("https://github.com/apache/arrow-rs/pull/7968".to_string()),
                               migration_guide_url: 
Some("https://arrow.apache.org/docs/rust/migration_guide.html#field-dict-id-behavior".to_string()),
                           });
                       }
                   }
               }
               
               // ARROW-015: Suggest new Decimal32/64 types for small precision 
decimals
               if let Some(mat) = 
Regex::new(r"DataType::Decimal128\s*\(\s*([1-9]|1[0-8]),")?.find(line) {
                   if let Some(captures) = 
Regex::new(r"DataType::Decimal128\s*\(\s*(\d+),")?.captures(line) {
                       if let Ok(precision) = captures[1].parse::<u8>() {
                           let suggestion = if precision <= 9 {
                               "Consider using DataType::Decimal32 for better 
performance with precision ≤9"
                           } else if precision <= 18 {
                               "Consider using DataType::Decimal64 for better 
performance with precision ≤18"  
                           } else {
                               continue;
                           };
                           
                           issues.push(Issue {
                               rule_id: "ARROW-015".to_string(),
                               severity: Severity::Info,
                               message: "New smaller decimal types available in 
Arrow 56.0".to_string(),
                               file_path: file_path.to_path_buf(),
                               line: line_num + 1,
                               column: mat.start() + 1,
                               suggestion: Some(suggestion.to_string()),
                               auto_fixable: false,
                               deprecated_since: None,
                               changelog_url: 
Some("https://github.com/apache/arrow-rs/blob/main/CHANGELOG.md#560".to_string()),
                               migration_guide_url: 
Some("https://arrow.apache.org/docs/rust/migration_guide.html#new-decimal-types".to_string()),
                           });
                       }
                   }
               }
           }
           
           Ok(issues)
       }
   
       fn check_cargo_toml(&self, _file_path: &Path, _content: &str) -> 
Result<Vec<Issue>, Box<dyn std::error::Error>> {
           Ok(Vec::new())
       }
   }
   ```
   
   ### Tests Required
   ```rust
   #[cfg(test)]
   mod tests {
       use super::*;
       use std::path::PathBuf;
   
       #[test]
       fn test_detects_int96_to_i64() {
           let rule = Arrow012015Rule;
           let content = r#"
   use parquet::data_type::Int96;
   
   fn convert_int96(value: Int96) -> i64 {
       value.to_i64() // Should trigger ARROW-012
   }
   "#;
           let issues = rule.check_rust_source(&PathBuf::from("test.rs"), 
content).unwrap();
           assert\!(issues.iter().any(|i| i.rule_id == "ARROW-012"));
       }
   
       #[test]
       fn test_detects_level_enum() {
           let rule = Arrow012015Rule;
           let content = "let level = Level::Column;";
           let issues = rule.check_rust_source(&PathBuf::from("test.rs"), 
content).unwrap();
           assert\!(issues.iter().any(|i| i.rule_id == "ARROW-013"));
       }
   
       #[test]
       fn test_warns_field_behavior_change() {
           let rule = Arrow012015Rule;
           let content = r#"
   if field1.contains(&field2) {
       field1.try_merge(&field2)?;
   }
   "#;
           let issues = rule.check_rust_source(&PathBuf::from("test.rs"), 
content).unwrap();
           assert_eq\!(issues.iter().filter(|i| i.rule_id == 
"ARROW-014").count(), 2);
           assert\!(issues.iter().any(|i| matches\!(i.severity, 
Severity::Warning)));
       }
   
       #[test]
       fn test_suggests_decimal32() {
           let rule = Arrow012015Rule;
           let content = "DataType::Decimal128(5, 2)";
           let issues = rule.check_rust_source(&PathBuf::from("test.rs"), 
content).unwrap();
           assert\!(issues.iter().any(|i| i.rule_id == "ARROW-015" && 
matches\!(i.severity, Severity::Info)));
       }
   
       #[test]
       fn test_suggests_decimal64() {
           let rule = Arrow012015Rule;
           let content = "DataType::Decimal128(15, 4)";
           let issues = rule.check_rust_source(&PathBuf::from("test.rs"), 
content).unwrap();
           assert\!(issues.iter().any(|i| i.rule_id == "ARROW-015"));
       }
   
       #[test]
       fn test_no_suggestion_for_large_precision() {
           let rule = Arrow012015Rule;
           let content = "DataType::Decimal128(25, 8)"; // Precision > 18
           let issues = rule.check_rust_source(&PathBuf::from("test.rs"), 
content).unwrap();
           assert\!(\!issues.iter().any(|i| i.rule_id == "ARROW-015"));
       }
   }
   ```
   
   ### Integration Steps
   1. Add to `src/rules/mod.rs`:
      ```rust
      pub mod arrow_012_015_misc_breaking_changes;
      // In RuleEngine::new():
      
rules.push(Box::new(arrow_012_015_misc_breaking_changes::Arrow012015Rule));
      ```
   
   2. Test patterns:
      ```rust
      int96_val.to_i64()                    // ARROW-012 Error
      Level::Column                         // ARROW-013 Error  
      field.try_merge(&other)              // ARROW-014 Warning
      DataType::Decimal128(7, 2)           // ARROW-015 Info
      ```
   
   ## Acceptance Criteria
   - ✅ Detects Int96::to_i64() usage (Error)
   - ✅ Detects Level enum variants (Error)  
   - ✅ Warns about Field behavior changes (Warning)
   - ✅ Suggests better decimal types (Info)
   - ✅ Appropriate severity levels for each issue type
   - ✅ Comprehensive test coverage
   
   ## Migration Summary
   - **ARROW-012**: Manual Int96 conversion required
   - **ARROW-013**: Level enum completely removed  
   - **ARROW-014**: Behavior change in Field methods
   - **ARROW-015**: Performance improvement opportunity


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to