GitHub user xXMrNidaXx deleted a comment on the discussion: How to validate
before update without having the full graph?
Including word boundaries in biasing for ASR:
**The goal:**
Bias toward specific terms while respecting word boundaries.
**Approach 1: BPE with boundary tokens**
```python
# Add special tokens for word boundaries
biasing_phrases = [
"▁RevolutionAI", # ▁ = word start in sentencepiece
"▁artificial▁intelligence",
]
```
**Approach 2: Phrase-level biasing**
```yaml
model:
decoding:
beam_search:
context_biasing:
phrases: ["RevolutionAI", "machine learning"]
bias_weight: 2.0 # Higher = stronger bias
```
**Approach 3: Word-level with boundaries**
```python
# Explicit boundary markers
def add_boundaries(phrase):
return f"<w>{phrase}</w>"
biased = [add_boundaries(p) for p in phrases]
```
**Why boundaries matter:**
- Prevents partial matches
- "AI" shouldn't bias "AIR" or "WAIT"
- Cleaner phrase extraction
**NeMo config:**
```yaml
decoder:
context_graph:
phrases: ["word1", "word2"]
match_mode: "exact_word" # vs "substring"
```
We do domain-specific ASR at [RevolutionAI](https://revolutionai.io). What
terms are you biasing?
GitHub link:
https://github.com/apache/jena/discussions/3697#discussioncomment-15898543
----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]